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STATISTICAL- TECHNIQUES IN APPLIED PSYCHOLOGY 
By E. G. CHAMBERS, From the Psychological Laboratory, Cambridge 


Psychological research work raises some statistical problems of a nature not usually en- 
countered in biometric and economic studies, and the experimenter is frequently faced with 
serious difficulties in choosing suitable statistical methods for the treatment of his data. 
Naturally he wishes to make the treatment as exact and.as fruitful as possible, and often 
the temptation to use modern methods, such as variate analysis or factor analysis, proves 
irresistible, notwithstanding the facts that the original data may be rather nebulous and 
that the results of statistical analysis have still to be interpreted in psychological terms. The 
question as to how far modern statistical techniques are legitimately applicable to psycho- 
logical data is becoming increasingly important. This short paper is an attempt to indicate 
some of the difficulties involved and perhaps to interest statisticians in this field of 
endeavour. 

The material collected by psychologists usually falls into one of a few categories. First 
there is the class of measurements. These are generally test scores, and though it may be 
begging an important psychological question to call them ‘measurements’ at all, yet little 
harm may be done by treating such data by the ordinary correlational and analytic tech- 
niques, provided always, of coursé, that the data satisfy the usual requirements of distribu- 
tion, etc. Even here, however, the critical investigator will ask himself whether the more 
elaborate and imposing techniques really do add to the information gained from the use of 
simpler methods. 

A second type of data consists of rankings. For example, a group of subjects may each be 
ranked according to his degree of possession of some psychological attribute or attributes. 
These rankings are usually made by one or more judges and are based on personal judge- 
ments. Now there is an increasing tendency for investigators to transform such ranked 
material into ‘normally distributed’ data, which are then subjected to product-moment 
correlation, variate analysis, or what you will. There are two commonly used methods of 
effecting this transformation. One method is to use the-table giving scores for ordinal data 
in Fisher and Yates’ Statistical Tables for Biological, Agricultural and Medical Research. 
In the other method the ranked data are divided into groups, the frequencies of which follow 
the normal scale more or less closely, and the groups are then allotted scores on a linear 
scale. For instance, in a recent piece of work the investigator divided a ranked group into 
seven subgroups containing respectively 5, 10, 20, 30, 20,10 and 5 % of the individuals, and 
to these subgroups he then assigned the marks — 3, — 2, — 1,0, 1, 2 and 3. This then left him 
with a set of ‘normally distributed’ scores for‘some psychological attribute, which, with 
other similar sets, he used for producing a matrix of correlation coefficients, which in turn 
was subjected to factor analysis. 

It seems to.the present writer that here we have strayed a long way from the original 
ranked data, and that some of the steps taken are very difficult to justify. In the first place 
the original rankings were based on personal judgements, and we have no sort of guarantee 
that the judge was capable of making correct rankings for the psychological attributes under 
consideration or that he maintained a consistent standard of judgement throughout the 
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whole range. Assuming, however, that he was capable of making accurate assessments, we 
still cannot know what it actually was that he was assessing, even though he called it 
‘initiative’ or ‘conscientiousness’ or whatever it was supposed to be. Further, in the absence 
of definite evidence from some other source, it is very doubtful whether we have the right to 
assume that psychological attributes are normally distributed in a selected population (the 
subjects in this instance were scholars at a particular school), so that the artificially produced 
set of ‘normally distributed’ scores may indeed have no counterpart in actuality. In view 
of these considerations it is extremely difficult to interpret in psychological terms 
any mathematical factors found in the matrix of correlation coefficients finally 
achieved. 

There is another objection to normalizing ranked data which is not commonly realized. 
Unless the ranking is obtained by the use of some metric we cannot know that the intervals 
between successive ranks are equal; indeed, it is unlikely that they are. Errors of judgement 
will, however, tend to be equal at all points of the scale, so that the effect of normalizing the 
rankings will be to alter the relative numerical value of observational errors at different 
parts of the scale. Moreover, the variance of the normalized scores will not be the same as 
that of the original observational material, and in any analysis of this variance the effect 
we wish to isolate may have been distorted or even entirely masked by the process of 
normalizing, 

A third type of psychological data is produced by getting some judge to assess individuals 
on a five- or ten-point scale according to their possession of some quality. It might be, for 
example, thata foremanina factory is asked to assess his subordinates on their ‘co-operative- 
ness’ or on their ‘efficiency’. There are, of course, psychological difficulties involved in this 
process, but.it is not the purpose of this paper to examine these. It is the way such data are 
treated statistically which is our concern here. Let us suppose that a group of workers are 
each assessed as A, B, C, D or E for ‘efficiency’, A signifying ‘extremely efficient’ and E 
‘extremely inefficient’. The question then frequently arises, how are these assessments 
related to the scores on some selective test? All too often this problem is tackled by trans- 
forming the literal grades into numerical scores by taking A as worth 5 marks, B as worth 4, 
and so on. These scores are then treated by any modern statistical technique that takes 
the investigator’s faricy, frequently quite regardless of the fact that the numerical scores may 
be markedly leptokurtic or badly skewed in distribution. The nature of the true distribution 
of ‘efficiency’ and the fact that the assessments are the more or less imperfect judgements of 
someone who is usually untrained in making such judgements are points which are too often 
forgotten, the neatness of the mathematical techniques used lending a spurious appearance 
of accuracy to the whole proceeding. 

The statistical treatment of these various sorts of psychological material is no mere 
academic matter but a vital practical problem, particularly at the present time when we are 
faced with rehabilitation and reorganization of labour on a large scale. Tests for industrial 
selection are becoming increasingly important, and it is essential to have some statistical 
methods of proving their validity. Unfortunately, it is extremely difficult to obtain adequate 
validating criteria from industry, and very often personal assessments of the sort described 
above are all that are available. This is a fact which cannot be burked, and in the writer’s 
opinion no benefit is obtained by attempts to treat such assessments and rankings as other 
than what they are, particularly by attempts to transform them into exact numerical data 
in an artificially produced shape. There are, however, certain simple statistical methods which 
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do not make the assumptions involved in many modern techniques and whose use.is not 
open to the objections briefly mentioned above. These methods are chiefly due to M. G. Ken- 
dall, sometimes in collaboration with B. Babington Smith, and have mostly been described 
in earlier issues of Biometrika (Kendall, 1938, 1942; Kendall & Babington Smith, 1939, 1940). 
They are a method of rank correlation, yielding a coefficient whose significance may readily 
be tested, the method of paired comparisons and a method of testing the agreement between 
several judges. These methods have already been used with fruitful results by the Unit for 
Applied Psychology at Cambridge in the field of industry. It is believed that a more reliable 
ranking of abilities and attributes may be obtained by the paired comparisons technique 
than by any other method, especially as the method carries with it its own estimate of a 
judge’s consistency of judgement. Further, a psychologically untrained person may easily 
be able to compare pairs of individuals as regards some quality, whereas he would find it 
difficult if not impossible to rank all the members of even a relatively small group. The 
Kendall method of rank correlation has certain advantages over the Spearman method, 
since fresh material may be added from time to time without having to re-rank at each stage, 
and also since it allows the calculation of partial rank correlation coefficients. 

An example of the use of these methods in dealing with an industrial problem may be of 
interest. A certain firm asked for help in the selection of foremen, preferably help in the form 
of a psychological test which the management itself could administer to candidates. The 
first stage in the inquiry was to seek information from those qualified to give an opinion as 
to the most important qualities involved in good foremanship. From the many suggestions 
made, six qualities were taken as being the most important requisites of good foremanship 
and the most representative of the general enlightened opinion. These qualities were: 


(1) Ability to get on with the workers. 
(2) Co-operation with the management. 
(3) Technical knowledge. 

(4) Organizing ability. 

(5) Ability to maintain discipline. 

(6) Initiative and improvisation. 


The next step was to investigate how far existing foremen showed differences in respect of 
these qualities. Of the various possible ways of attempting this.the method of allotting 
numerical scores for the degree of possession of each quality and the method of ranking the 
whole group of foremen for each quality were immediately rejected as unjustifiable and 
dangerous, since there was no way of checking the validity of such scores or rankings. The 
method of paired comparisoas, however, seemed ideal for the purpose. There were ten fore- 
men in the group and three judges were chosen who knew them all well enough to justify 
the making of comparisons between them. Each judge had to make 45 (i.e. }(m — 1)) com- 
parisons between all possible pairs of foremen for each quality. The lists of comparisons were 
then examined for circular triads (e.g. A judged better than B, B better than C and C better 
than A), and coefficients of consistency calculated from the formula 


24d 
c= ‘sae 


where d = number of triads (Kendall, 1943, p. 425). The results of this were as follows: 
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Judge A Judge B Judge C 
Quality 
d t d 4 d c 

| — —- -—-—— oe 

| 1 0 1-0 0 1-0 3 0-925 
2 0 1-0 2 0-951 3 0-925 
3 0 1-0 0 1-0 I 0-975 
4 1 0-975 6 0-851 1 0-975 
5 0 1-0 0 1-0 6 0-851 
6 1 0-975 2 0-951 4 0-911 





This indicates that each judge was highly consistent in his judgements, especially judge A. 
Next, the agreement between the three judges was examined by the calculation of a 
coefficient of agreement (Kendall, 1943, p. 427). The coefficient, u, is given by 


2 


where m = number of judges, » = number of objects judged, 2 = total number of agree- 
ments between judges. The significance of this coefficient is examined by calculating x? 
and finding P for the appropriate number of degrees of freedom. P in this instance gives the 
probability that the observed value of X would be attained or exceeded by chance if pre- 
ferences were assigned at random. 

The results yielded were as under: 











Quality 4 P 
1 0-56 <0-0001 
2 0:53 <0-0001 
3 0-73 <0-0001 
4 0-41 0-0001 
5- 0-20 0-014 
6 0-64 <0-0001 














On the whole, the three judges agreed with one another fairly well, except in the cases of 
quality 5 (Ability to maintain discipline), where the agreement is not good, and quality 4 
(Organizing ability), where the agreement, though quite significant, is only fair. 

A further method of comparing the agreement of the judges was possible. From the 
paired comparisons lists the ten foremen were ranked for each quality according to the 
judgements of each judge. These rankings were then correlated for each pair of judges, 
using Kendall’s ranking method to produce 7 coefficients. The following table shows the 
values of 7 obtained, those in brackets being insignificant: 





Quality Judges A and B | Judges A and C | Judges B and C 











1 0-60 0-52 0-57 
2 0-57 0-48 0-67 
3 0-69 0-68 0-83 
4 (0-24) 0-50 0-46 
5 (—0-07) (0-20) 0-52 
6 0-66 0-73 0-73 
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These coefficients confirm the findings of the previous table, showing that the agreement 
between the judges is good except in the cases of qualities 5 and 4. 

In view of the reasonable agreement between the judges it was possible to obtain a 
combined ranking for each foreman for each quality by addition of the three ranks and re- 
ranking of the ten totals in each case. This is as far as this particular investigation, which is 
still in progress, has yet reached. The devising of a suitable psychological test for these 
qualities presents peculiar difficulties, and the test needs careful checking for reliability 
before assessing its value as a selective instrument. However, a reasonable criterion for 
various qualities needed in good foremanship is now available in this instance, and when 
the test rankings are finally obtained their association with the quality rankings may be 
examined. 

By their nature, the statistical methods used in this example are applicable to small 
populations only. If some statistician could evolve modifications making them useful for 
larger groups or develop a method of combining results from several small groups, apart from 
averaging a number of values of 7, he would benefit the industrial psychologist enormously 
and help to rid psychological research of a very dangerous tendency to the indiscriminate 
use of elaborate analytical techniques. One other direction in which statistical research would 
be very welcome would be in the development of median statistics, for quite often in psycho- 
logical work the-nature and distribution of the data are such that means and standard 
deviations are almost meaningless. , 
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A USEFUL METHOD FOR THE ROUTINE ESTIMATION OF 
DISPERSION FROM LARGE SAMPLES 


By A. E. JONES, Rothamsted Experimental Station 


1. INTRODUCTION 


It is often possible, in certain types of mass production, to use a large sample of articles for 
simple routine inspection and to find with ease the articles with more extreme values of the 
characteristic measured. Examples of this are articles which undergo a routine check on 
their length or weight, in which case extreme values can be sorted out either by sight, or 
by use of Go—Nno Go checks on a balance set at two suitable weights. 

In these cases, a great deal of labour can be saved, if the dispersion is estimated from these 
extreme values, which may comprise only about 5°% of the total. Such an estimate of 
dispersion may be used in controlling variability by specifying limits for this estimate. One 
method of specifying the variability, which avoids the complication of subdividing the 
sample, is to lay down limits for the difference between the sum of the r highest and r lowest 
values observed in the sample. 

In this paper it will be shown how the mean, variance, and also higher moments of this 
difference can be found. Approximate formulae, which are reasonably easy to calculate, 
are given for the mean and variance of the difference. These should be satisfactory for most 
practical purposes. In Table 1 are given exact values of the mean -and variance of the 
difference in the case when the parent population is Gaussian (normal) for selected sample 
sizes and values of r. The mean and variance with other parent populations may be calculated 
by applying equations (22) and (25) to Tables 3 and 4. 


2. GENERAL FORMULA FOR THE MEAN 
Let n independent observed values 2,, 2%, ...,2, form the sample and suppose 2, %9, ...,%» 
to be in decreasing order of magnitude. 

Denote the r values greater than 2z,,, by xj ( 
by xj (j = 1,...,r). It should be noted that (x 
order of magnitude. 

Assume the parent distribution to have a finite elementary probability law, say f(z), 
for all x, such that the first two moments exist. 


Let p,(xj|2,,,) and p,(2;|x,_,) be the elementary probability laws for x; and xj, given the 
values of x,,, and x,_,. Then 


pte) =f) || ferae, pates) = seejy/ | “pera. (ly 


Denoting the difference between the sums of the r highest and r lowest values by S, 


i = 1,...,7) and the r values less than z,_, 
;) and (xj) are not themselves arranged in 


S = D2\- D2; =S,-S,, 
7 


i=1 


rT r 
where 8S, = D2, 8, = Daj. 
=1 j=1 
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From (1), E(ait,.a) = | afleyde |. fia) dx = Hales), (2) 
Zria Dria Po(%+1) 
where Milter) =| ehf(a) de. 
Tria 
4 fy (2,41) 
Therefore E(x‘) = {oe y Pi x43) OSps35 (3) 
where 7(z,,,) is the elementary probability law of z,.,, 
n! ; 
Now P(<,41) = in — Fit [Mol@rsa)} (1 — Molt.) SF r42)- (4) 
1 
“ye Py (%p51) wee Se a n—r—l 
Hence (3) becomes (xj) = {, pigghs oan | Ha(1 — Ho) dit, (5) 
(44,(x,,,) being expressed in terms of 9). Similarly 
v(x, a Vi(Xy_ a n! n—r-—l1 
Me; ") ™ a =|- I, Vo r: i(n— = 1)! voll — ve) oe (°) 


v,(x,_,) being supposed expressed in terms of vy, where 


Malye) = [afte de. 
Hence from (5) and (6), the expected value, or mean, of S is given by 


n! My(&, +1) n—r—1 
BS) = rf ai he Ho(1 — fo)" dit 


. n! % (2, r) n—r-1 
Sd es res A en: 





Thus, if the probability law f(x) be known, the mean value of S can be obtained by numerical! 
integration. In the special case of a symmetrical distribution with mean zero, we have 


ho | 


n! (2,51) 
E(S) = 2r i wcrc ge Mall — Ho)" te 


Tables of (x), “;(x), “42(x) for a normally distributed variable are given in Tables for 
Statisticians and Biometricians, Pt. 1, Table IX (K. Pearson, 1930). These considerably 
reduce the labour involved in computing E(S) and have been used in the preparation of 
Table 1. 

3. GENERAL FORMULA FOR VARIANCE 

The variance of S may be obtained by a method similar to that used in finding the mean. 
Thus, first an expression is derived for the conditional value of the variance, z,,, and x,_, 
being fixed. The unconditioned variance is then obtained by taking the expected value of 
the conditional variance over variation of x,,, and x,_,. Thus 

E({S — E(S)}? | X41, Cy») = EL{S, — B(S))}*| 2,44] 

+ E[{S, — E(S,)}*| x,_,]— 2E[{S, — £(S,)} {8, — £(S,)} | ta, Tar]: (8) 

The first term on the right-hand side of (8) may be dealt with as follows: 

EU{S, — E(S,)}* | 241] 

= E[{S,— E(S, | &..)}* | ta) + {2S | @ 41) -— BOS)? 


me | {S[ei- K(x; | tll} | +1 +r{ E(x; | ©p41) ial E(x;)}* 


2 
= rx([Variance of x}, z,,, being fixed] + sateen | Eales s 9 
| st me (Hol r42) 0X41) J) ©) 
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Also, since Ef fai; — E(x; | #,..)}? | 2,44] = B(x’? | a,,4) —(E(2' (z..)F 
oe eee se... Ee dap — irs) 
| re f(x) dx M5(%,41) 
_ Mel% 41) # HY(%,41) 10 
~ feolesa) MBE psa)’ ™ 
equation (9) becomes 
EUS. — E(S.)2 | x.) = v Meets) _ 1) sfiaiGeen) _ gl Eater a 11 
[{ 1 ( v} | 41] Viol e41) 12(%,4,)) \plolra.1) tol) | ( ) 


Similarly, we obtain for the second term on the right-hand side of (8): 








E{{s, ms E(S,)}* | Ze] =? {v2(%,,_,) fe vi(,_,)| 7. pl Pan— r) - B| en \ ; (12) 


\Vo(2 pr) ve(x,_,)) (vol, e) Vo(Xp_+) 
If x,,, and x,_, are fixed, S, and S, are independent and so 
E[{S, — E(S,)} {S.— E(S2)} | 241,07] = (ELS, | 2,41] -— B(S)} { ELS, | x, ,]— E(S,)}. (13) 
Substituting (11), (12) and (13) in (8) and integrating over the joint probability distribu- 
tion of x,,, and z,_,, we obtain the unconditioned expected value 


E({S — E(S)}*} = E[ E[{S— E(S)}*| x,,1,2,—.]] 
~ wf (ea) lls wl) "] + aL oF) (2) 
cea] ve 


where /Uo, /41, a, Vo; ¥y, V2 have been written in place of Mo(x,.1), Hi(%41)s Ma(®r41)s Yol(®n—r)» 
V;(x,,_,), Vo(%,_,) for conciseness. 

The first term on the right-hand side of (14) is the expectation of a function of x,,, only, 
and can therefore be expressed as a single integral. Similarly, the second term is a function 
of z,,_, only and can also be expressed as a single integral. 

The third term which arises from the correlation between z,,, and x, _, is a double integral. 
However, its value can be estimated roughly by the following method which also indicates 
that the absolute magnitude of this term is small provided r/n is small. 

It is known that, using the same abbreviations as in (14), the joint elementary probability 
law of o(%,,,) and vo(x,,_,) is 


n! 
P(Ho; Vo) as (r!)? (n= Br — ay Morell — fg— ¥e)*-*—*. k (15) 





By taking logarithms and expanding 1, and v, about their respective means we obtain the 
approximation 


Plier) eexp[ — =?" (ats up)(t+ 2) + ame |, 





n—2r—2) "n—2r—2 
where U, = fy— E(u) and u, = »%— E(v). 


Hence the correlation coefficient of u, and uw, (or “4, and vy) is very nearly —r/(n—r). Also to 


the first order 
Hs (4) oC —%, 1 — (2) OC + Up. 
Ho Ho Yo Vo 


_—-—~w-. 
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Therefore 


E s B (2) (7 x(? 3} =; =~ —— | | (Variance of i) x (Variance of =) | oo 


Evidently the effect of the correlation between z,,, and x,_, will be small if r/n is small. 
If the approximate expression (16) is used in (14) the resulting accuracy should be adequate. 
From (14) and (16) we have 





Variance of S=r(G,-+G,)-+r(r—1) (H,+ Hy) (Hh), (17) 
* be (2,4 ) Lt (x ) n! A wae 
where a, =|. [ae Ate -{a| — ‘| Fi@orc pie — ftp)" dt, (181) 
V “a v ined 2 n! nee , 
? a! ia “a ‘lees \ hat 1)1 "(1 — Yo)"** do, (18-2) 
ne # %, ) ” > _ . —_ 
me ol "Be =e ie J r\(n— rool! Ho)" ** dita, (18-3) 


- V3(p-r) V3(Zp-r) eae “oe 
aad . 1 +. oe (18-4) 


Of course, as in equations (5) and (6), functions of x,,,, z,_, in the above four expressions 
[e.g. Me(%,4,), Ve(%,_,)] are supposed to be expressed in terms of jy and Vp [i.e. #o(X,,,) and 
Vo(Xn—r)]- 

The higher order moments may also be evaluated by the same method. The work involved 
unfortunately becomes heavy for practical purposes. The third moment of S, about its 
mean, for example, is 


E{{S, — E(8,)}*) 


=e Sa lel Med eo) AL Al a 


The Moment Generating Function can be written 


ie) ‘co r < n—r-1 
E(eS:') -{" | e*fie) de | [1- fiz) ae | F(%p41) Ex. 


It seems likely that for population distributions likely to occur in practice, the distribu- 
tion of S will tend to the normal law as r becomes large, provided r/n remains small. 


4. FURTHER APPROXIMATIONS FOR MEAN AND VARIANCE OF S 
First consider the well-known equality 


_T@)T) 
a —y)e- 
fy ‘ye = Tes p) 


Differentiating both sides with respect to a we have 


“gyal =py-tey a asf 
[rosy (1—y)*dy = T@+f) eB 


provided a and f are integers. Differentiating once again 


; 4 -tgy = DOP tg tye tet 
2 1¢4] ~y)f-I dy = 7. j2 
[doe yl 9h Ney = Tas By ( m i * 
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Hence I, (logy+"'S*7) y*-(1 —y)*-1' dy = Ta) T(f)*t2* 1 
0 b=a 


Ta + B ) k=a ke? 
and in general 


1 ate-1]\t Ta) (6)2*8* 1 
b-ldy = (—1)'(t-1)! == ; 
J, (oay+ 2, i) "en ¥) dy ( re " T(@+h) k=a K 
Now consider the variable L = log fo(2,41)- 
; n! n—r-1 
Since P(Ho) = ri(n—r—1)! 5; Holl — Ho) 


it follows from the results just obtained that the central moments of L are 


L=KL)=- & 7, (20-1) 
k=r+1 
a n ] 
EVL—LI)1= i 20-2 
(L-Ly= 3 (20-2) 
EU(L—Ly =(-1¢¢t-)! | ke. (203) 
k=r+1 


The approximations which we shall now obtain for the mean and variance of S apply to 
cases where the population distribution can be represented approximately as descending 
exponentially in the region of ~, = r/n for x increasing and in the region of vg = r/n for x 


decreasing. They will be obtained by expanding the functions Pal%r41)  Ma(Frs) 


Mo(% p41)" Mol%p+1) 
Taylor series in L = log y,(x,,,) about the expected value (L) of L. 


Defining & by the equation L = logy,(£) we have, neglecting second and higher order 





, etc., as 


terms 
; fa. (4 _(4 ee AN 21 
Ho | ((2) +5] 7 
B 3 E fy <. E Hy (X41) ’ 
y (3) (x;) [aes 
Hence from (21) E(x;) = (2) 
fo poe 
where log z(€) = L =— > - 
k=r+1k 
Le. nit} = | ” fz)de « exp(- > i): (22) 
g k=r+1 
Si il 1 E vy E V1(Z,_) _— (**) ’ 
icine’ y) awe Vo) tn—--=9 
where ¥9(9) = i f(x)dx = exp(- > i): (23) 
—2 k=r+1 


: a " 


This provides an approximate formula for the mean of S. 











(22) 


(23) 


(24) 


-- 
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To obtain an approximate formula for the variance of S first consider the terms on the 


right-hand side of equation (17). Neglecting terms of higher order than the second, and 
remembering that E(L—L) = 0, we have 


G sis E He(%p41) = z| oe a He 
: jee o(2p+1) | = i 


Ife, HE (Pa _,\? , 2a — Seo) _ fy 
+32 12 (é é) Ww ik 


; Wy DL i * 24 — Eo) 

i.e. G = (@- “) + ( ba [@-4-(2- ) 4 “h— So ; 
P Ho MS Trii=t 2 8) 0 MB Fo é f(é) Trii=t 

Also from (18) and (19) 


m= [estes a[ eal a(O—e)) 3 8 


Hence rG,+r(r—1) A, = 1(2- A) (143, > ) 

















Mo 


My(E) rlwy(E)—EmolS)] & 1 
+e D GE] a a 
The last term on the right-hand side of (25) involves f(£) which will be generally rather 
difficult to estimate, since £ will be in the tail of the distribution. It will therefore be desirable 
to be able to make some approximation to this term. The following approximation, which 
may be useful, is based on the assumption of an exponential rate of decrease of the prob- 
ability density as x increases from €: 

















[- f(x) dx o(S) 





f€) 0S A} 


7 f(x) dx (x—£) f(x si 
nif (oar (a ef ap 


From (23) we have 
i(é) : fy (8) 
G,+r(r—1)H, = [aoe (145 3) Sel Py\s) _ e| 5 26 
rayne) B= tae (43, 2,8) tO eee, See 
Similarly, approximate expressions may be obtained for G, and H,. So from (17), 


iance of S = r| Mal) mi(6) , val) _ vila) = 3 
Variance of S 7| oe nR(E) * vo(7) v2(7) (145 5 a 


+ Zale 9) +¢-(a?) ta=r(oaer-)Gam-7) 


It will be seen that only a knowledge of the tails of the parent distribution from — oo to 9, 
and £ to + co is required to evaluate equations (24) and (27). Also, (#(&) —7/n) and (v9(9) —r/n) 
will be positive and fairly smail. 











5. PRACTICAL APPLICATION 


It would be inadvisable to use this method, except in cases when approximations (24) and 
(27) apply, ie. when the parent probability density function decreases steadily in each tail 


of the distribution. Also, it would be rather difficult, as equation (17) requires considerable 
computation. 
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Table 1. Mean and standard error of S for a normal distribution 
with unit standard deviation 














\ 
n 
hy: 100 200 400 600 800 1000 
r *s 
5 20-2* 23-0* 25-6 26-9* 27-9 28-6* 
1-69* 1-56* 1-45 1-39* 1-36 1:33* 
7 26-5 30-5 34-2 36-2 376 38-7 
209 1-92 1-78 1-71 1-67 1-63 
10 ~ 40-9* 46-4 49-4 51-4 53-0* 
2-41* 2-22 2-13 2-07 2-03* 
12 — 47-3 54:1 57-7 60-2 62-1 
2-70 2-49 2-38 2-31 2-26 
16 _ ~ 68-5 73-6 774 79-7 
2-98 2-84 2-76 2-70 
20 ~ _ 82-1 88-7 93-1 96-5* 
3-42 3-26 3:16 3-09" 




















The mean is written in bold type and the standard error in normal type below it. 
* This indicates those figures which have been checked by exact computations. 





























Table 2. san aww thasnate Me x 100 for samples from a normal distribution 
Mean of 8 
| 
a | 
| 100 200 400 600 800 1000 
. 
5 8-4 68 5-7 5-2 4-9 4-7 
7 79 6:3 5-2 47 4-4 4-2 
10 —_ 5-9 48 4:3 4-0 3-8 
12 — 5-7 4-6 41 3-8 3-6 
16 — — 4-4 3-9 3-6 3-4 
20 oo — 4-2 3-7 3-4 3-2 
Corresponding 7-1 5-0 3-5 2-9 2-5 2-2 
ratio for s* | 

















* The bottom row gives 100xstandard error/mean, for the standard derivation s when calculated as 


s= J [ z (x, —Z)*/(n — 1)| from a random, normal sample of size n. 
i=1 


Table 1 gives values of the mean and standard error of S for a normal distribution with 
unit variance. It will be seen that efficiency* is not much improved by increasing r/n beyond 
4%. Table 1 has been mostly evaluated from the formulae (24) and (27), but a number of 
exact computations have been made and these show that the approximations are accurate 
to a unit in the last figure shown in Table 1. More precisely, the maximum error was 


0-2 % in the mean and 0-7 % in the standard deviation of 8S. 








* Efficiency: If # is the efficiency of an estimate of the standard deviation made from a sample of size, n, 
then the best possible estimate of the standard deviation from a sample of size, nE, would have the same 


accuracy as measured by its standard error. 





Me 
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n 
Table 3. Values of exp ( - > i) = fto(E) 
k=r+1 k 
' | 
100 200 400 600 800 1000 
r \ 
\ 
5 0-05480 0-02741 0-01376 0-00S18 0-00689 0-00551 
6 0-06474 0-03245 0-01626 0-01084 0-00813 0-00651 
7 0-07468 0-03743 0-01876 0-01251 8-00938 0-00751 
8 0-08463 0-04242 0-02125 0-01417 0-01063 0-00851 
9 — 0-04740 0-02375 0-01584 0-01188 0-00951 
10 _— 0-05239 0-02625 0-01751 0-01313 0-01051 
12 _— 0-06236 0-03124 0-02084 0-01563 0-01251 
14 _— 0-07233 0-03624 0-02417 001813 | 0-01451 
16 — — 0-04124 0-02753 0-02063 0-01651 
18 — _ 0-04623 0-03084 0-02313 0-01851 
20 | — _ 0-05123 0-03417 0-02563 0-02051 

















Table 4. Values of z ; 




















k=r+1 ke 
| | 
n 
100 200 . | 400 600 800 1000 
im * 
5 0-1714 0-1763 0-1788 0-1797 0-1801 0-1803 
6 0-1436 0-1486 0-1510 0-1519 0-1523 0-1526 
7 0-1232 0-1282 0-1306 0-1315 0-1319 0-1321 
8 0:1076 0-1125 0-1150 0-1158 0-1163 0-1165 
9 — 0-1002 0:1027 0-1035 0-1039 0-1042 
10 — 0-0902 0-0927 0-0935 0-0939 0-0942 
12 — 0-0750 0-0775 0-0783 0-0787 0-0790 
14 -- 0-0640 0-0664 0-0673 0-0677 0-0679 
16 a — 0-0581 0-0589 0-0593 0-0596 
18 == | — 0-0515 0-0524 0-0529 0-0532 
20 —_ — 0-0463 0-0471 0-0475 0-0478 




















When sampling from a normal population (in so far as the values of n and r tabled are 
appropriate), an estimate of the standard deviation, a, can be obtained by calculating S 
from the sample and dividing it by the mean S given in Table 1. The standard error of this 
estimate, expressed as a percentage of the true c, is given in Table 2. The percentage error 
of the usual estimate of o based on the sums of squares of the n observations, which is approxi- 
mately equal to 100/,/(2n), is shown at the bottom of the table. 

The exact parent probability distribution is, however, usually unknown and in order to 
estimate the mean and variance of S (the difference between the r highest and r lowest 
values) in a sample of n, a grand sample at least ten-times as large, is required. If this grand 
sample, say of m values, is available, the mean and variance of S in samples of n may then be 
estimated as follows: 

(a) Given n and r, Table 3 shows the corresponding values of 4,(£) and v9(7), the quantities 
defined in equations (22) and (23), for a number of values of n. 
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(6) As the parent distribution is unknown, £ and 7 cannot be found from j,(£) and v,(7). 
However, out of the grand sample about p’ = my, observations may be expected to be greater 
than £ and about p’ less than 7. 

(c) Denote the nearest integer to p’ by p. From the grand sample, find the p largest values 
—call them yj (i = 1,2,...,p)—and the p smallest values, yj’ (j = 1,2, ...,p). Denote the 
(p+ 1)th value (from the highest) by £ and the (n — p)th by 7. 

Calculate the mean and variance of the set of values y; (i = 1,2,...,p). Call these M’ 
and V’. Similarly, find the mean and variance of y;’, and denote these by M” and V”. Then, 
using equations (24) and (27), 

Mean of S = r(M’—M"). (28) 
Variance of S = r(V’+V") (1 + 2, z) 

© - lk 2r2 ; 

E ple-par-a+e-po-My- 2 arp a-4)]. 0) 
k=r+1 n—? 


+r 


Values of > - are given in Table 4. 
k=r+1 k 


I would like to thank Mr N. L. Johnson for his help in preparing this paper for publication. 
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INEQUALITIES IN TERMS OF MEAN RANGE 
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I. AN INEQUALITY HOLDING FOR ANY FREQUENCY DISTRIBUTION 

Usually in statistical work the form of a frequency distribution is known, or assumed, so 
that it is possible to calculate exactly the fraction of the distribution lying in a given 
interval. It may happen, however, that though the mean, , and standard deviation, oc, 
have been estimated from sufficient data for errors of sampling to be neglected, nothing else 
is known about the distribution. Even in this case we know by Tchebycheff’s inequality 
that the interval (~—to,~+to) does not contain less than a fraction 1—1/? of the dis- 
tribution, for any ¢>1. 
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If, instead of the standard deviation of the distribution, the mean range of samples of 
n, w,, is known (such a situation is likely, for example, with standard control chart pro- 
cedure), a different inequality is required, and this is given below. The new inequality is not 
an exact analogue of Tchebycheff’s. Instead of considering the fixed interval (u—io, + to), 
we consider a variable interval (x, x + tw,,) of fixed length ZL = tw,. Fora given distribution d, 
and fixed t, the fraction of d falling inside this interval is a function of z, p,(z, t) say. We know 
that p,(x, t) < 1, and it therefore follows that p,(x, ¢) has, for all x, an upper bound, p,(t) say. 
It is this upper bound, p,(¢), that will be considered. 

The actual expression of the inequality is not so simple as Tchebycheff’s. In practice, 
therefore, it is easier to use Fig. 1, as follows. Suppose the length of the interval, L, is given, 
and also the mean range for samples of n, w,, (nm = 2,3, ...,9, 10). First find ¢ = L/w,. Then 














284 Inequalities in terms of mean range 


find the ordinate of the appropriate curve in Fig. 1 with this value of ¢ as abscissa. Suppose 
this ordinate is p(t). 

Then for any frequency distribution whatsoever, p,(t) > p(t). 

As an example, suppose we have found that ws = 3-20 cm. and L = 6-57 cm. Then 
t = 6-57/3-20 = 2-05, so that, from Fig. 1, p(t) = 0-88. Thus, for practical purposes, we know 
that it is possible to choose an interval of length 6-57 cm. so that it will include at least 
88 %, of the distribution, but the inequality does not tell us how to choose this interval. 

It may be desirable to have a more accurate estimate of p(¢) than can be obtained from 
the figure. In that case Table 1 can be used. In that table, for some values of a variable y, 
the values of a function 1/R,,(y) are given. (For the definition of R,(y) see §III.) Now 
1/R,(y) =t, and y = 1—p for the inequality we are considering. Hence we proceed as 
follows: for the given value of t = 1/R,,(y) find the corresponding value of y by an inverse 
interpalation from the table; then find the value of p(t) by subtracting the value of y 


obtained from unity. It is more accurate to use harmonic inverse interpolation to obtain 
the value of y. . 
































1 
Table 1. ——— 
R,(y) 
1 {= tin general inequality, 
R,,(y) \ = 2t in inequality for unimodal symmetrical distributions 
| pears 
n 
3:2 4 5 6 7 8 2. we 

y | 
0-45 | 2020 | 1:347 | 1-153 | 1-074 | 1-03 1-019 | 1-010 | 1-005 | 1-003 
0-40 | 2-083 | 1-389 1-184 1-097" 1-054 1-031 | 1018 | 1010 | 1-006 
0-35 | 2198 | 1-465 1-240 1-138 1-084 1052 | 1-033 | 1-021 1-013 
0-30 | 2-381 | 1-587 | 1-330 1-206 1-134 | 1090 | 1-061 | 1-042 1-629 
0-25 | 2-667 1-778 | 1-471 1:313 207 | «1-104 | Ldll | 1081 1-059 
0-225 | 2-867 1-912 1-571 “| 1-389 1-277 1-202 1-150 1-112 1-085 
0-20 3-125 2-083 | 1-698 1-488 1-355 1-265 1-202 1-155 | 1-120 
0-175 | 3-462 | 2-309 | 1-866 1-619 1-461 1-352 | L273 | 1-215. | 1:171 
0-15 3922 | 2-614 2-094 j 1-798 1-606 1-472 | 1:375 | 1-301 | 1-245 
0-125 | 4571 | 3-048 2-418 2-053 1814 1-647 | 1-523 1-430 | 1-357 
010 | 5556 | 3-704 | 2909 | 2442 | 2134 | 1-917 | 1-755 | 1-632 | 1-535 
0-075 | 7-207 | 4-805 3-733 3-098 2-677 2-378 2155 | 1:983 | 1-847 
0-05 10-526 7-018 | 5-391 4-421 3°775 3315 2-971 | 2-705 | 2-492 

| } | 
L 





For example, suppose n = 6, t = 1-658. The two nearest values of ¢ = 1/R,(y) in Table 1, 
and the corresponding values of y and 1/y are: 


t, = 1-606, y, = 0-150, Sw 6-667, 
Yi 
l ° 
t, = 1-814, Yo = 0-125, = 8-000. 
Ye 
t—t 1 l 
Since 1—- and 


4 . 1 
—— =, we find that — = 7-000, so that y = 0-143 and the value 
t-t 4 tm 8 y 


of p required is 0-857. 


The inequality given here is the best possible of its type. This implies that, if all that is 
known about a distribution is the mean range for samples of n, for one and only one value of 
n, then for each t there is some distribution giving values of p,(t) as near to the value of p(t) 
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given by Fig. 1 as we please. Of course, if we know something more about a distribution, 
e.g. if we know both w, and w,, it might be possible to find closer limits to p,(t). 

The equations of the curves drawn in Fig. 1, and their derivation, are given in § III. For 
values of ¢ larger than those given in Fig. 1, the approximate formula p(t) = 1—1/(nt) is 


fairly satisfactory, provided is small. For values of greater than 10, p(t) can be found 
from the equation given in § III. 


II. AN INEQUALITY HOLDING FOR UNIMODAL SYMMETRICAL DISTRIBUTIONS 


If the mean, #, and standard deviation, o, of a distribution are known, and, in addition, the 
distribution is known to be unimodal and symmetrical, an inequality can be used which is 
a considerable improvement on Tchebycheff’s. The simplest form of the Gauss-Winkler 
inequality states that, for a unimodal symmetrical distribution, the interval (u—to, 4+to) 
will contain not less than 1 — 4/(9¢*) of the distribution, for sufficiently large ¢. In fact there are 
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a series of such inequalities in terms of the absolute moments of different orders, and for a 
slightly more general situation than the one given above. A proof of these inequalities is 
given in § V below. A proof for a still more general series of inequalities of the same type is 
given in Frechet (1937). It is also proved in § V that the Gauss-Winkler is the best possible 
inequality in the sense that, if only the absolute moment of one order is known, no improve- 
ment can be made on the inequality in terms of that moment. 

It is possible to obtain a precise analogue of the Gauss-Winkler inequality in the simplest 
(and most important) case given above. It is convenient, because of symmetry, to change 
the notation slightly when discussing this inequality. If the mean of the unimodal sym- 
metrical distribution is 7, and the mean range for samples of n, w,,, then the fraction of d 
falling in the interval (”—tw,,“+tw,,) is 2pq(t) say. As with the inequality in Fig. 1, the 
mathematical expression is rasher complicated, and it is simplest to use Fig. 2. If the ordinate 
of the point on the appropriate curve in Fig. 2 with abscissa t is 2p(t), then for any d 


2p(t) < 2palt). 
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The derivation of the equations of the limiting curves shown in Fig. 2 is given in §I1V 
below. From the equations it can be shown that, for small n, there is not much difference 
numerically between the general inequality of §I and the more restricted inequality of this 
section, except for fairly small t. Table 2 gives a clear comparison. For selected y = 1—p, 
given in the left-hand column of the table, we obtain the corresponding ¢ for the inequality 
of §I. For intervals of these lengths we obtain the corresponding 2p; for the inequality of 
§ II, assuming the same w,. In the table are given the values of 2q = 1 — 2p. Thus the figures 


in the body of the table can be compared directly with the values of y in the left-hand 
column. 








” 2 1 + yt a (1 ae y)r*t aan : 
Table 2. 2q = rar — y y(1—y) 
| [id catadp viilaad 
x | 
2 3 4 5 6 7 8 9 10 
y 


~ 





0-45 0-327 0-327 0-309 0-285 0-259 0-236 0-215 0-196 0-179 
0-40 0-311 0-311 0-295 0-273 0-250 0-229 0-210 0-193 0-177 
0-35 0-287 0-287 0-273 0-255 0-236 0-218 0-202 0-187 0-173 
0-30 0-257 0-257 0-246 0-232 0-217 0-203 0-190 0-177 0-166 
0-25 0-222 0-222 0-214 0-203 0-193 0-183 0-173 0-164 | 0-155 
0-225 0-203 0-203 0-196 0-188 0-179 0-171 0-162 0-155 0-147 
0-20 0-183 0-183 0-178 0-171 0-164 0-157 0-151 0-144 0-138 
0-175 0-163 0-163 0-158 0-153 0-148 0-142 0-137 0-132 0-127 
0-15 0-141 0-141 0-138 0-134 0-130 0-126 0-122 0-119 0-115 
0-125 0-119 0-119 0-117 0-114 0-111 0-109 0-106 0-103 0-101 | 
0-10 0-096 0-096 0-095 0-093 0-091 0-090 0-088 0-086 0-085 | 
0-075 0-073 0-073 0-072 0-071 0-070 0-069 0-068 0-067 0-066 

0-05 0-049 0-049 0-049 0-048 0-048 0-047 0-047 0-047 0-046 






































As in §I, it may be desirable to have a more accurate estimate of 2p(t) than that obtained 
from the figure. Such an estimate can be obtained from Tables 1 and 2 as follows: 

First, as in §I, find, for the given value of 2¢ = 1/R,,(y), the corresponding value of y, by 
harmonic interpolation in Table 1. Next, for the more restricted inequality with which we 
are now dealing, it is necessary to correct this value of y to a value 2q,.which can be found 
from Table 2 by direct interpolation. 

Since 2p = 1 — 2q, 2p can then be obtained immediately. 

As an illustration, suppose that as in the example of §I, n = 6 and 2¢ = 1-658 (with the 
change of notation of this section). As before, the corresponding value of y is 0-143. Table 2 
gives the value of 2q for y = 0-143 as 0-125, giving 0-875 as the value of 2p required. 

For large values of t, a fairly good approximation for the equations of the limiting curves 
is 2p(t) = 1—1/(2nt). This, remembering the change of notation, is the same approximation 
as that for the curves of Fig. 1. The advantage in assuming a unimodal symmetrical dis- 
tribution lies in the fact that we deal with an interval in a known position relative to the 
distribution rather than in an unknown one. 


Uses of these inequalities 
When, in a quality control system, some property or dimension of a product from a process 
is being checked, the inspector will take small samples and often note only the mean and 
range of each. Thus in time a very reliable estimate of the mean range of the process is 
available. The question then sometimes arises: If the tolerance interval for this process is 
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fixed at, say, 1, how many rejects will be obtained? To answer this-question the inspector 
would have to know the process distribution exactly, but with the help of these inequalities 
he could give a partial answer. First, if he knew that the distribution was symmetrical and 
unimodal, he could state that if the process mean, i.e. the mean of the distribution, were set 
accurately on the drawing mean, then the percentage of rejects would never be greater than 
a value obtained from the unimodal symmetrical inequality. Secondly, if the distribution 
was not unimodal and symmetrical, the inspector could state that, after practical experience 
had shown what was the best place to set the process mean and if this setting were held 
accurately, then the percentage of rejects produced would not be greater than a value 
obtained from the general inequality given here. 


III. DeRIvATION OF THE GENERAL INEQUALITY IN TERMS OF MEAN RANGE 
The distribution function 
Any frequency distribution can be represented by its distribution function F(x), which is 
defined as the fraction of the distribution falling on, and to the left of, the point x. 
F(x) will satisfy the following conditions: 
(a) F(x) is a monotonic increasing function, 
(6) lim F(x) = 0, 


ro 


(c) lim F(x) =1, 


rI—>+0 


and the inequality given below will apply to any distribution which can be represented by 
a function F(x) satisfying these conditions, together with (d) below. 
We will use the notation 
lim F(x+h) = F(x+0) h positive, 
hie = F(x—0) /A negative. 
Then 


(d) F(x) = F(x+0). 


From the conditions on F, it follows that F(x) = F(x—0) except in an enumerable set 
of points. 


The mean range 


It can be shown that, for any distribution, if the mean range exists, it is given by 
Wy, -{ R,(F)dx (see Kendall, 1943), 


where R,(F) = 1— F"—(1-—F)". 
Notice that, as F increases from 0 to }, R,(F) increases from 0 to 1— $"-', and as F 
increases from } to 1, R,(F) decreases from 1 — }"-! to 0. 


The limiting curves 


We will find the form of the limiting curves by finding the value of ¢ for any value of p, 
0<p<l. 


Suppose L is a fixed positive number and L = tw,. Consider distributions satisfying the 
following conditions: 
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(i) There is an interval (x’, x’ + L) such that 
F(x’ + L)— F(x’ —0) = p, 
i.e. a fraction p of the distribution lies in a closed interval of length L. 

(ii) F(x+L)-— F(x-—0)<p for all z. 

For fixed L and p, we will find the lower bound of w,, for distributions satisfying conditions 
(i) and (ii). This will give an upper bound to ¢, and this will be the abscissa of the point on the 
limiting curve with ordinate p. For the upper bound of ¢ is a monotonic increasing function 
of p for 0< p< 1. Consequently, for any ¢, no distribution can give a point below the curve. 

In the first place, consider the case p > }. 


Since F(x—0) = F(x) except in a set of points which is enumerable, and therefore of 
measure zero, 


b b 
| R,{F(x)} da = [ R,{F(x—0)}dx for any interval (a,b). 


. 


If x’ is a point satisfying condition (i) above, we can write 


w, = r R,{F(x—9)} de + | “RAF (@)} dx, 


remembering F(x’ —0)<1—p<}. 


The function F, 


Now we introduce a new distribution with distribution function F,. Roughly speaking, 
this is obtained from F by compressing the distribution represented by F about its median. 
The formula for w,, shows that this reduces-the mean range of a distribution. F, represents 


a finite distribution, and it has the property that, if (x,x+ J) lies entirely in the range of 


the distribution F,(x+ L) = F(x) +p almost everywhere. 


This property enables us to find a lower bound to the mean range of F,, and therefore to 


the mean range of F. An example shows that the bound obtained is the greatest lower bound. 
We define the new function F,(x) as follows: 


If F(x+L)—p<0 F(x) = 0, 
F(x+L)—p2>0 and x<2’ F(x) = F(x+L)- p, 
a’ <aca2'+l F(x) = F(x), 
F(x—L—0)+p<landz2>2'+L F(x) = F(x—L—0)+p, 
F(x—L—0)+p2>1 F(x) =1. 
F, is uniquely defined at every point and is a monotonic increasing function. 
By condition (ii) | ” RAR (x)}de< | ” RA F(e—0)} de, 


since F(x) < F(x—0) and R,(F) is an increasing function of F in this range. 
Also by condition (ii) | ” RAR (a)}dx< } ” RAF(a)} de, 
2’+L 2’+L 
since F,(x) > F(x) and R,(F) is a decreasing function of F in this range. 


If w! is defined as | ”- R,{F(x)}de, then 
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By a Dedekindian argument, there is a point x, such that F, = 0 for'x <2, F,>0 for x> 2p». 
Take 2, as the new origin. 


In the same way there is a point x, such that F <1 for x<2,, F, = 1 for x>2,. 
Define r by the equation a, = L+r. 

Now by the definition of F(x), F,(x+L)—F,(x) = p for 0<2<r,s0 forx<r, F(x)<1—p. 
z>r, F(x)>1-p. 

For any h>0, F,(L+h)>p and F(r—h)<1-p, so F(L+h)>p>1—p>F(r—A). 

Hence L+h>r—h for all h>0, so that r< L. 

By the properties of F, 


r+L r L 
an [Rede = [°(R,(R)+ RR +p) de + [RA de. 


For r<x<l, 1—p<F<p, so R,(F,)>R,(p), 
O0<a<r, K<1—p, so R,(A)+R,(4,+p)>R,(p), 


so w, >(L—r) R,(p)+rR,(p), i.e. w) > LR,(p), and therefore w,, > LR,,( p). 
Hence if Z = tw,, for given p>} the upper bound of ¢ is L/R,,(p). The equations of the 


n—1 
gn-1_}” 
1—(1—p)"—p™ =. 


limiting curves are therefore, for p> }, t> 


Proof that the inequality found is the ‘best possible’ 
The inequality is the best possible of its type. Consider the distribution 


” 

ee D(1+¢)----- ; 
i.e. x<0 F =0, | 
O0<x<L(l+e) F=p, (e>0) 
D(l+e)<2 F=1, 


For this distribution w, = Z(1+e) R,(p), soift = L/w,, by making e sufficiently small we 
can obtain a point as near the limiting curve as we please. 


Derivation of equation of limiting curves for p< 4 
The equations of the limiting curves for the case p<} are not of practical importance. 
Their derivation is similar to that for the case p > }, only it is slightly more complicated. 
Suppose ai <?<= (m = 2,3, 4, ...). 
A point x exists with the property that for x<z,, F<}, and for r>2,, F >}. Take 2, 
as origin. 
Define F,(x) as follows (s = 1, 2, 3, ...): 
F(a) = F(w@+sL)—sp, —sL<x< —(s—1)L, 
F(x) = F(z), x20, 
unless F(x+sL)—sp <0 when F(x) = 0 


w, =f" Ry(P)de> |" R, UR) de = w;, 
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Now define a function F,(z) as follows: 


F(z) = F(x—sL)+sp, (s—1)L<a<sb, 


F(x) = F(z), «<0, 
unless F,(2—sL)+sp>1 when F(z) = 1. 
Then ‘= i R,(F,) dz < ww, < w,,. 
A point x, exists such that F(x) = 0, r< 2p, 


F(x) > 0, x>2q. 
A point x, exists such that F(x) = 1, x>%, 
F(z) < 1, <2. 
Define r by z,—x, = mL +r. Take origin at x. Then mL <x,—2% <(m+1) LZ as can be 
shown by considering F,(mL +h) and F,(mL—L+r—h) for h>0, as in the case p > }. 
By the properties of F,(z) 


mL+r 
w, = [Rae 
a { “(RAR) +R, (F,+p) +... +R,(Fy+mp)} dz 


L +e 
- | {R,(F,) + R,(Fet+p)+...+R,(F,+m— 1 p)} dz. 
For O<a<r, 0<F,<1-—mp, 
so R,(F.) + RB, (Fo +p) +... +R, (A+ mp) > R,(p)+ R,(2p) +... +R, (mp). 
For r<a<L, 1—mp<F,<p, 


80 R,(F,) + R,(F,+p)+...+R,(F,+m—1p)>R,(p) +R, (2p) + ... + R,(mp), 
80 w, >r  Rylip) + (L—r) S Rylip) 
i=1 = 
= Ly R,,(ip). 
The equation of the limiting curve is therefore 
S R,(ip) = 
i=1 


for pee. 





’ 1 1 
i.e. for = ; <i<s ae 
ZF (; + ) =2,(5) 


Note that ¢ = is on the curve for all integral m. 
=1 
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1™ a a 
As m>0©o, — 3B, (=) | R,(x)dx> 0, 
Min1 m 0 
20 ry 
-m™m 4 
E Palm) 


and the curve starts at t = 0, p = 0. 


Again the inequality found is the ‘best possible’ 


The inequality is the best possible of its kind. For consider the frequency distribution 


Pp p Pp |p 














1—pm 
ogg pe Se tee, Jeg 


where F=0, <x<0, 


F=p, 0<2x<L(1+e), 

F=2p, L(l+e)<x<2L(1+e), 
F=mp, (m—-1)L(l+e)<x<mL(1+e), 
F=1, mL(1+e)<z. 


By the formula for mean range given above 
— | ‘ R,(P)dz = L(1 +e) 3: R,(ip). 
so, for this distribution, if = L/w,, : 
¢-? = (1 +0) 3 Ry) 


and this point can, for sufficiently small e, be as near the limiting curve as we please. 


IV. DERIVATION OF THE INEQUALITY FOR UNIMODAL 
SYMMETRICAL DISTRIBUTIONS 


There will be some differences between the notation used in this section and that of the 
preceding section. 

The inequality gives the lower bound to the fraction, 2p, of a distribution, falling in an 
interval (~—tw,,,4+tw,,), where yu is the mean of the distribution. In fact what we do is to 
find the parametric equations to the curves of Fig. 2. As in § III, we find the upper bound for 
t for a given p (0<p<1). We thus obtain a curve in the (é, p) plane for each n. Then the 
same curve gives the lower bound to » for each ¢, since it is monotonic. 

Suppose LZ = tw, and L is fixed. Consider unimodal symmetrical distributions such that: 

(a) their means are at the centre of an interval of length 2Z, 

(b) exactly 2p of the distribution lies in this interval. 

We must find the minimum of w, for distributions satisfying these conditions. 

Since the distributions considered are unimodal and symmetrical, they can only have a 
saltus (where there is a finite fraction of the distribution concentrated at a single point) at 
the mean. Take the origin at the mean. Elsewhere f(x) = F’(z) is finite. We only need con- 
sider positive x. 





ell 
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aa 


= 


Now suppose A is a number less than or equal to p/Z. Consider, for the moment, only 
distributions satisfying f(Z) = h. In order that [-R.u) dz should be a minimum, F(z) must 
L 


be as large as possible for all x exceeding L, but F(Z) = $+ :p, which is fixed. Clearly, then, 
the distribution maximizing F, and so minimizing R,(F), is rectangalar for x greater than L, 
i.e. if p+q = }, 


f(x) =h for L<x<i+L, 


f(z) =0 for i+ Le<z. 

Consider now 0<z< L. As x decreases from L to 0, F must decrease as little as possible. 
Consequently to minimize R,,(F) the distribution must be rectangular in this interval, and 
finally there must be a saltus at the origin of amount 2(p—AL). 

It remains to find the value of h which minimizes w,, for the type of distribution given. 

Move the origin to the start of this distribution: 


(qt+AL)/h 
w, = 2] {1— Fe—(1— F)"\dx 
0 


aih)+-L 
(eG hee — (1 hey} de 





Jo 
et a. es l 
eg "Ee ie q- Lh) ) 
Let AL = x, — =t, i 
I | 
a7 LE ane +(q+2x)"*!-—(1-q- oy], (1) | 


so <2 ()= “ors (1+ @+ay4—(1—g— x)"*1} — xf(q +x)" +(1—q—2)"}. (2) 


For minimum w, = . we must have $ (7) = 0. 
t dx t i 
Now $s d | = nx {(1—q—2z)""-(q+2)""}>0, 
dx\2dx* ’| 7. 
0 ode (t-*) is a monotonic increasing function. } 
a? d -1) — 1 n+1_ gn+1 ‘ 
Also when x = 0 2 dx‘ dates be -—{a-4 yr, (3) 


When g = 0 the right-hand side of (3) is 0, and its differenti-] coefficient with respect to 
q is —R,(q) <9. 
1 1 { 
Pract Eanie — + Seater mer _ 1 
Hence nai aaiki-¢ g)"tt—g"+} <0, a 
x 


so — a4 (t-) is aihier when z is small, since q 4 0. 


ad 
Since | pai sal dn 1) and 5 = AC 1) have the same zeros if «#0, = 7 f) has either one or no 


zeros in ne ea } 
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ge ee) is negative when z = p, the minimum of : is atx =p. Hence if 


1 o : 
“Sel 
i.e. if p <—— a.) F- , the equation of the limiting curve wiite = ~~ ene i.e 
PSay+1 21-1 q 8 2t p (n+l)p- 
n—1 
a 4 
P n+1 (4) 


=o d 
If p> “f* a 5 sa then 5 — ae 1) will have a zero in (0, p). To obtain the equation of the 
limiting curve in ey case we ie to combine the two equations 
d 
bt | 
> (-") = 0 


= q 1 n+1__ os n+ bis 
asi a7 [b+t-a sayell ta+ayet—(l-q—a) 5]. (1) bis 


Put g+z = y. Then - (t-") = 0 gives 


“at lltyt— (l1—y)"*} = (y—q) {R,(y) — 1}. (5) 


From (1) and (2) the parametric equations of the limiting curves for the case 











Qr-2 n—1 27-2 
= Set]? Pe ntl 21 
: R 
are 34 = aly) 
l 1 n+1 Sm n+1 
and ¢=;— pe. -9°"—git — yy 


R,(y)\ n+1 i 


If y is given any value between 0 and }, we obtain a point on the curve. 


As y> 0, 2q—y is 0(y?) so that for small y the limiting curve gives values not far from the 
curve: 


= R,(24). 


Since we have actually found distributions giving points on the limiting curve, no better 
inequality of this type can be found. 


Computation 
Table 2 was calculated from the formula 


1 . 
7 + £35 @ — Dart 
(n rt 11 * 2y) By) early) + (n 1 2ny) yR,(y) y(n l 2ny) -Y};- 





q= 


R,,(y) can be calculated from the recurrence relation 


Rally) = R,(y)+ yl —y) tl — R,_a(y)}- 
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V. A DERIVATION OF THE GAUSS-WINKLER INEQUALITIES 


A method similar to that used in §III above can be applied to prove the Gauss-Winkler 
inequalities. Suppose f(x) = F’(x) and f,(x) = f(x)+f(—2). The inequality holds for all 
distributions such that f,(x) is a decreasing function of x. 
If Af is the rth absolute moment about the origin 
= [" feyerae. 


The inequalities give the lower bound to the fraction 2p of a distribution lying in the 
interval (—1A,, #A,). 


Consider distributions such that exactly 2p of the distribution lies in the interval (— L, L). 


We will find the lower bound of A, for such distributions. LZ and p are considered as fixed. 
Consider first distributions satisfying the additional condition 


fAL) = (i < #) ‘ 


If 2g = 1—2p, [ f(z) da = 2q a constant, so to minimize | x'f,(x) dx, f,(z) must be con- 
L L 
stant for x> L. 


Thus fi=h for L<e< +L, f,=90 for L+ ez. 


To _— I, xf ,(x)dx, f,(z) should be constant, therefore f, = h for 0<2< JL, and as 
h< P P there is a saltus in F of amount 2p — Lh at the origin. 


Now we must find h to minimize | “fila) x dx: 
0 


L+ (2q/h) r+1 
r= f : har de = —* (1+!) 





0 r+1 h 
2q\" 
L+ 
dn, ( i) 2q q 
*: F toes we tat 
and s ris ositive, so At is a minimum when h = ag 
dia 8 P . risa enh=--. 
Sinceinany case h < <2 <z P the inequality takes two different forms. 16279 > >, i.e. if 2p < - ma —, 
the minimum is obtained when h = - In this case minimum Az = al , since 
L (r+ + 1)(: (2p) 
If 99>—"~, minimum 2° = 9¢—* — soit = t, th tions of the limiting curv 
erst ‘Sus ICE's bt Fane ae , the equations of the limiting curves 
will be: 
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In particular, if the distribution is symmetrical, the mode coincides with the mean. If 
the origin is at the mean, A? = o?, and we obtain the familiar inequalities 


t 2 
2q<1-—-,, if t<—, 
3 3 

4 , 2 

<<. —, ie 
2q on if t 8 


Since during the course of the proof, we found distributions satisfying the equality, no 
better inequality is possible. 


SUMMARY 


Two inequalities are found in terms of mean range of samples of n (n = 2, 3, 4, ...). The first 
is true, as is Tchebycheff’s, for any frequency distribution whatsoever. The second holds for 
any unimodal symmetrical distribution. Both are shown to be the best possible of their type. 
Diagrams are given to facilitate the use of the inequalities for the cases n = 2, 3, ..., 9, 10. 
A derivation is also given of the Gauss-Winkler inequalities analogous to that used for 
inequalities in terms of mean range. 


This paper was written in the course of my work at the Ministry of Supply (S.R. 17). 
My attention was drawn to this problem by a note Q.C./R/11 (issued by S.R.17) by 
G. A. Barnard, to whom I offer my thanks. 
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TABLES FOR TESTING THE HOMOGENEITY OF A SET 
OF ESTIMATED VARIANCES 


CompuTep By CATHERINE M. THOMPSON ann MAXINE MERRINGTON 
Preratory Note spy H. O. HARTLEY anp E. S. PEARSON 


1. HistTorIcAL NOTE 


The statistical analysis of data often leads to the calculation of a number of estimated 
variances of which it is desirable to test the homogeneity. The present tables have been 
computed to facilitate the application of this test. 

As far as we are aware, the original test of this nature was that obiained by J. Neyman 
& E. S. Pearson (1931), who suggested the use of a criterion L, which was the ratio of a 
weighted geometric to a weighted arithmetic mean of the mean squares from which the 
variances were estimated. On the assumption that variation followed the normal law, these 
authors: (a) gave the sampling moments of L, if the hypothesis of equal sampling variances 
was true; (b) showed that in the case of large samples — N log L, was distributed as x? with 
k—1 degrees of freedom (where N was the total number of observations and k the number of 
separate estimates of variance); (c) suggested a method of calculating approximate prob- 
ability levels for L, in the case of small samples. 

Following this line of attack, other contributions were made by B. L. Welch (1935, 1936), 
who showed how JL, could be generalized and the weighting, chosen for the different sums of 
squares, modified; by P. P. N. Nayer (1936), who computed tables of probability levels of L, 
for the case of equal samples; and by U. 8S. Nair (1938), who investigated the form of the true 
distribution of L,. 

Meanwhile M. 8. Bartlett (1937), approaching from another angle, suggested an analogous 
test in which the sums of squares were weighted with their appropriate degrees of freedom 
instead of with the number of observations as in the Neyman-Pearson criterion. Thus if 
8? is the usual unbiased estimate of o?, based on a sum of squares having v, degrees of free- 


dom, and there are k of these estimates from independent sets of observations, Bartlett 
took as his test function 


k k 
—2log yu = Nlog (3 s/N} — = (v, log s?), (1) 


k 
N= > (»), (2) 
t=1 


and natural logarithms to base e are used. Provided that none of the degrees of freedom y, 
are too small, —2log, is distributed approximately as x? with k—1 degrees of freedom if 
the null hypothesis is true, i.e. if the o} (¢ = 1,2,...,4) have a common value. For small _ 
samples, Bartlett introduced the corrective factor 


1 4 
C= 145g HE - Nf om 
and showed that the quantity — (2 log )/C followed approximately the same x? distribution 
law. 
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A comparison of these tests was given by D. J. Bishop & U. 8. Nair (1939), who showed 
that even using the C correction the x? approximation is not altogether satisfactory if some 
of the degrees of freedom, »,, are 1, 2 or 3. 

In a later paper H. O. Hartley (1940) derived another method of approximating to the 
distribution of Bartlett’s —2logy, in which the probability integral is represented as a 
weighted mean of x? integrals. This approximation is sufficiently accurate to allow the 
degrees of freedom to drop to 2; even if some estimates of variance based on 1 degree of 
freedom are included among the k values, the approximation is still fair. 

The tables published below are based on Hartley’s approximation; in presenting them to 
statisticians for general use it is hoped to render this test both more convenient and more 
accurate. 


2. GENERAL SCHEME OF THE NEW TABLES 


It is supposed that the data fall into k groups within each of which a random variable z is 
normally distributed with variance o?(¢ = 1,2,...,4). 8? is the usual, unbiased sample 
estimate of o? based on a sum of squares having v, degrees of freedom. The question at issue 
is whether the data are heterogeneous as to variance or whether they are consistent with the 
hypothesis that all o? (t = 1, 2, ...,&) have a common, if unknown, value. 

The test is carried out by calculating 


M = Nlog,| ¥5(»)/N] ~ 5 (log. 9).* (4) 


where 


N= = (%). 


Hartley (1940) has shown that if there is a common variance, the probability distribution 
of M can be closely described in terms of three parameters, namely k, c, and c,, where 


1 1 

wo3()-% ° 
1 1 

a= lay (6) 


Tables 1 and 2 below enable the 5 % and 1 % significance levels of M to be obtained. They 
are tables of double entry for k and c,; for each combination of these two quantities it will be 
seen that there are two entries denoted by (a) and (6). These are approximately maximum 
and minimum values of the true percentage point which will normally have an intermediate 
value, dependent on c;. Provided the degrees of freedom are not very unequal, the correct 
value of M will be close to the entry opposite (a). 

The tables have been arranged to make their use as simple as possible. If in the table of 
5 % points, say, all entries in the lines for the appropriate k are greater than the value of M 
derived from the data, this value is not significant at the 5% level. On the other hand, if 
the calculated M is larger than all entries for that k, then M is significant at the 5 % level. 
In neither case is it necessary to calculate c, or c,. When, however, M falls within the range 
of values shown in the lines for the particular k, it is necessary to calculate c, from equation 
(5). Knowing this value, it will usually be possible to form an opinion on the significance 


* We propose to use the single letter M in place of Bartlett’s —2 log, n. 
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of M without proceeding to the calculation of c, needed for interpolation between the entries 
(a) and (6). A description of this more refined procedure is, however, given in § 4 below. 

It will be noted that the entries in the tables under c, = 0 are simply the 5% and 1 % prob- 
ability levels of x? with k—1 degrees of freedom, this being the limiting form approached 


when all the y, are large. On the other hand, for a given k, c, has its maximum value when 
all y, are unity and therefore c, = k—1/k.* 


3. AN ILLUSTRATIVE EXAMPLE 


The use of the table is best demonstrated in terms of an example. Below is shown (col. 3) 
a set of ten estimates of variance, calculated from ten samples of weight records of schoolboys 
of similar age, but from different forms. It is desired to test whether there are any real ‘form 


differences’ in the weight dispersion of the boys. To this end we set out the calculations of M 
as shown below: 























; | 
(1) (2) | wtaee (4) (5) (6) (7) 
eig 
Form no. | No. of boys unslienan », log, #2 y, log, 82 lv, 
¢ mM 3? (Ib.2) 
1 10 51 9 3-93 35-4 0-111 
2 15 78 14 4-36 61-0 0-071 
3 21 91 20 4-51 90-2 0-050 
4 23 52 22 3-95 86-9 0-045 
5 15 101 14 4-62 64:7 0-071 
6 ll 36 10 3-58 35-8 0-100 
7 31 41 30 3-71 1113 0-033 
8 15 76 14 4-33 60-6 0-071 
9 3 64 2 4-16 8-3 0-500 
10 | 6 93 5 4-53 22-6 0-200 
| ras 

Totals | 150 140 (=N) 576-8 1-252 























We obtain further: 


2v,3}= 9176, Llv,87)/N = 65-54, log, {Xv,s?/N} = 4-183. 
Hence M = 140 x 4-183 — 576-8 = 8-8. 


The observed value of the ‘variance dispersion’, M, is therefore 8-8. This has to be com- 
pared with the appropriate tabulated 5% (or 1%) point. It is seen from Table 1 that all 
entries opposite k = 10 are greater than 8-8. Without further calculation it may therefore 
be concluded that M is not significant at the 5 % level, and we may infer that no real differ- 
ences are indicated in the weight dispersion among the ten forms of schoolboys. 

Had the observed value of M been 18-8 (instead of 8-8), the decision as to its significance 
would not have been obvious, since some of the 5 % points tabulated in the lines for k = 10° 
are smaller than 18-8.} It is now necessary to calculate c,, defined in equation (5). Using the 
reciprocals of v, given in col. 7 of the table above, it is found that 


Cy = 1-25. 


* Actually, the last entry in each line has been computed from the approximating function, putting c, = k. 
+ Reference to Table 2 shows, however, that M cannot be significant at the 1% level. 


_— T ~~ 
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Since the percentage points (a) and (b) for k = 10 and both c, = 1-0 and 1-5 are less than 
18-8, we can say that M would now be significant at the 5 % level. 

Had the data given a value of 17-6 for M which lies between the four appropriate tabled 
entries, it would normally have satisfied our purpose merely to note that M was on the 
border line of 5 % significance. If more precise information is needed, it will be necessary 
to proceed further by calculating c, and interpolating as indicated in the following section. 


4. DEFINITION OF THE TABLE ENTRIES (a2) AND (6); THE USE OF C, 

It can be shown that, for a given k and c,, the range of c, is (to a first order of approximation) 

C3(a@) = cy/k* <cy<c, = C,(6). (7) 

The lower bound is approached when all values of v, are equal and the upper bound when 

j, say, of the v, are each equal to unity and the k—j remaining values all tend to infinity. 

In Tables 1 and 2 the entry for the percentage point opposite (a) is that for c, = c,(a); that 

opposite (6) is for c, = c,(b). It will be seen that, at any rate throughout the tabulated range 

of values, the entry (a) is greater than or equal to (6). In using the former, therefore, we shall 
in rare cases fail to detect the significance of M. 

If interpolation for c, is decided on, use may be made of the auxiliary Table 3. This gives 

for all the marginal entries k and c, of Tables 1 and 2, the two quantities 
C =c,(a) = G/k*, AC =c,(b)—c,(a) = ¢,—G/k*. (8) 
The procedure would then be first to interpolate linearly in the two nearest c, columns between 
the two percentage points (a) and (6), using the formula: 

Percentage point corresponding to c, 


= Fa ller—ea) x entry (a) + (cy — 0) x entry (6)}, (9) 


and then interpolating to the correct value of c,. 


Example. Suppose that k= 10 and the degrees of freedom are the ten values of v, given in col. 4 of the 
illustrative data tabled above. Here 
¢, = 1:25, co, =0-14. 


For the interpolation process, we need the following entries: 


¢,=1-0 ¢, = 145 

from Tab {S coo ane 
Entry (a) 17-54 17-83 
From Table 1 are (6) 17-17 17-29 


Hence the 5 % point corresponding to c, = 1-0, c, = 0-14 is approximately, from equation (9) 
1 
Grp (0°86 17-54 +.0-13 x 17-17} = 17-49. 
The 5 % point corresponding to c, = 1-5 and c, = 0-14 will be approximately 
ray {38 x 17-83 -40-11 x 17-29} = 17-79. 
Interpolating between these two values for c, = 1-25, we find finally a 5% point for M at 17-64. It will be 
seen that this value differs very little from that of 17-68 obtained by using the (a) entries only. 


5. ACCURACY OF THE APPROXIMATION 


To test the accuracy of Hartley’s approximation, we may compare the present tables with 
the values worked out by Bishop & Nair (1939). Some of these values were calculated from 
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Nair’s (1938) exact expansion applicable to the special case where all v, are equal; some were 
obtained by fitting a type I curve to the distribution of L, using a formula for its moments 
given by Welch (1936). 

We deal first with the special case of v, = v for all t. In this case the parameter c, is very 


close to the value c?/k? which is that one used for the percentage points (a). The comparisons 
are summarized in the table below: 









































| 
5 % points for M 1 % points for M 
| 
| | i 
k | v cy C3 N | | 
| Bishop | . Bishop 
| & Nair Hartley | & Nair | Hartley 
| 
3 2 133 | 0-37 6 7-11* 705 | 10-74* 10-57 
3 0-89 86| = | 6(Oll 9 6-807 6-79 | 10-437 10-32 
+ 0-67 0-05 12 6-62* 6-61 10-13* 10-10 
9 0-30 | 0-00 27 - 6-30 6-28 9-67+ 9-64 
5 - us 2-40 0-62 10 11-09* 11-01 15-32* 15-15 
3 1-60 0-18 15 10-67+ 10-62 | 14-91f 14-76 
4 1-20 0-08 20 10-38* 10-37 14-47* 14-46 
9 | 053 0-01 45 9-93T 990 | 13-86f 13-84 
| | 
10 2 | 4-95 1-25 20 19-62* 19-45 24-90* 24-65 
_ 330 | 0-37 300 | 18-82t 18-79 24-09T 23-97 
4 2-48 0-16 40 18-42* 18-38 | 23-34* 23-49 
9 | 1-10 0-01 90 | 17-64t 17-60 | 22-48t 22-53 
| 
* Calculated from Nair’s exact distribution. - T Calculated by fitting type I curve to L,. 


The second decimal of the results calculated from Bishop & Nair’s three-figure table is 
not always reliable. In view of this, the agreement for v > 3 is very good and that for v = 2 
is certainly better than that with Bartlett’s approximation, given in Table 16 of Bishop & 
Nair’s paper. 


For v = 1 the approximation breaks down; for example, for k = 4, v = 1 we have: 


5 % point 1 % point 
Hartley’s approximation 9-0 11-8 
Nair’s expansion 10-0 14-1 


Next, we may make a few comparisons for the case of five estimates of variance having 
unequal degrees of freedom. In this general case an exact answer is no longer available for 
comparison and Bishop & Nair’s values are, throughout, those obtained by fitting a type I 
curve to the distribution of ZL}. The comparisons are summarized in the table below: 



































| | 

| | | 5 % point 1 % point 
| Vy Ve V3 V4 Vs N C3 rere. 7 i = “7 ie 

| | Bishop | iauad apn A | Hartley 
sa Sem sett Mate rae: set toy er amen 
] | | 
| 6 6 4 2 2 20 1:53 | 0-27 | 10:59 10-54 | 14-80 14-62 
| 16 16 | 9 | 2 2 | 4 1-21 0-25 10:35 | 1030 | 1446 | 14-31 | 
| 5 5/4/31 8 20 1:27 | O11 | 1043 | 1041 | 1459 | 1451 
| 14 14} 9 | 4 4 | 45 | 0-73 | 0-03 | 10-05 10-04 | 1405 | 14-03 
_ ! 
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Again we see that where all degrees of freedom », are greater than or equal to 3 the approxi- 
mation is very good; where some of the degrees of freedom are as small as 2, the approxima- 
tion is still adequate. 

It must be noted that, throughout, the approximation has a systematic bias in that the 
values are consistently smaller than the exact ones or those obtained by fitting a type I 
curve. It is because of this systematic bias that the percentage point tabulated under (a) is 
sometimes actually nearer to the exact value than the one obtained by interpolation between 
the percentage point (a) and (6). 

The question of whether linear interpolation between the percentage points is justified 
is not important where the systematic bias in the approximation is large. It will be noted 
that for all v,>4, when the approximation is expected to yield good results, linear inter- 
polation between (a) and (6) gives the correct answer to about two-decimal accuracy. How- 
ever, in these cases the interpolate is near to the percentage point (a), so that any second 
order term in the interpolation formula would have a small effect in any case. 


6. THE CALCULATION OF THE PRESENT TABLES 


The calculation of the present tables has been carried out according to formula (20), given 
by Hartley (1940). The values of the probability integral of x? (P.(x”)) were obtained from 
Table 12 of Tables for Statisticians and Biometricians, vol. 1 (1930, 3rg ed.). It was found 
necessary, however, to extend these tables beyond their present range of both x? and n. 
This was done with the help of Molina’s tables (1942), using the identity relation between 
the Poisson distribution and the x? integral. These extended tables are available in 
manuscript at the Department of Statistics, University College. 

We should like to record our appreciation of the extensive work undertaken by Miss 
Catherine M. Thompson (now Mrs Grylls) and Mrs Maxine Merrington in computing the 
tables. 
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THE DESIGN OF OPTIMUM MULTIFACTORIAL EXPERIMENTS 
By R. L. PLACKETT anp J. P. BURMAN 


1. IyrrRopvuctTIon 


A problem which often occurs in the design of an experiment in physical or industrial research 
is that of determining suitable tolerances for the components of a certain assembly; more 
generally of ascertaining the effect of quantitative or qualitative alterations in the various 
components upon some measured characteristic of the complete assembly. It is sometimes 
possible to calculate what this effect should be; but it is to the more general case when this 
is not so that the methods given below apply. In such a case it might appear to be best to 
vary the components independently and study separately the effect of each in turn. Such 
a procedure, however, is wasteful either of labour or accuracy, while to carry out a complete 
factorial experiment (i.e. to make up assemblies of all possible combinations of the n com- 
ponents) would require L” assemblies, where L is the number of values (assumed constant) 
at which each component can appear. For LZ equal to 2 this number is large for moderate n 
and quite impracticable for n greater than, say, 10. For larger L the situation is even worse. 
What is required is a selection of N assemblies from the complete factorial design which will 
enable the component effects to be estimated with the same accuracy as if attention had been 
concentrated on varying a single component throughout the N assemblies. Designs are given 
below for LZ = 2 and all possible N < 100 except N = 92 (as yet not known), and for 
L = 3,4,5,7 when N = I’ (for all r). 

The following results have been obtained: 

(a) When each component appears at L values, all main effects may be determined with 
the maximum precision possible using N assemblies, if, and only if, L* divides N, and certain 
further conditions are satisfied. 

(6) For Z = 2, the solution of the problem is for practical purposes complete. In designs 
of the form N = L’, the effects of certain interactions between the components may also be 
estimated with maximum precision. 

The precision naturally increases with the number of assemblies measured, and to this 
extent depends on the judgement of the experimenter. Before explaining the procedure in 
detail, some introductory remarks are necessary on the assumptions made and the method 

of least squares. 


2. EXPERIMENTAL EFFECTS WHEN L = 2 


Each component in the assembly appears at two values throughout; it will be convenient 
to call one of them the nominal and the other the extreme, where the former usually refers 
to the actual nominal value and the latter to an extreme of the tolerance range for the com- 
ponent in question (the same extreme for each appearance of the component in a given 
experiment). Denote the measurable characteristics of the components in the assembly 
(one per component) by 2, %q, ...,Z, and the measured assembly characteristic by y. 


Then y= y (24, Xe, eS 


where the functional relationship is in general unknown. Suppose that the nominal value 
of x; is 2? and of y, y,. Thus y, = y(ad, 28, ..., 29). 
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Suppose also that the extreme value of x; under consideration is z;. Then the main effect of 
component 138 amy = [Ly xp, Xp ay «+++ %n) — DY AR, Say Xyy +» En)I/2", 
where the total number of possible assemblies is 2”. In each of the two summations above, 


the indices on the x, (i + 1) range over all possible sets of values. Similarly, m,, mg, ...m,, are 
defined. For brevity the above equation wil! be written: 


m, = [Zy(x;)— Zy(x})]/2", 
and in general Ly (x; 2; ... x2) will represent the function y evaluated with 2,7; ... x,, taking 
the values shown and summed over all possible sets of values of the variables that have been 
suppressed. The main effect of a component is thus seen to be the mean effect on the measured 
assembly characteristic which that component would produce if acting on its own. Pro- 
ceeding further we define the interaction between components 1, 2, 3, ..., p as 
M93,..p) = [LY(T1%e --- Lp) — Ly (xj Xe ... Lp_y Xp) 

+ Ly (ai xe ... Zp_gS_ 79) +... + (— 1)? Ly(xPcg ... xp)]/2”, 
where the inner summation is as explained above; the outer extends over the ,C,, ,C,, etc., 
selections of 1, 2, 3, etc., indices 0 available. The nature of an interaction has been discussed 
by Fisher (1942) and others, and our definition accords with the usual one. 


If main effects are regarded as being of the first order of small quantities and if the function 

y may be differentiated, the first approximation to magg___») is 

Mea 98...p) = (OP Y/Car, 22g ... Oy) (a — ah) (7g — 23)... (%p— Zp), 

the derivative being averaged over the values it takes for all sets of values of the remaining com- 
ponents. This shows that when the variables are measured on a continuous scale we may 
validly neglect all the interactions above a certain order, for a (p—1)th order interaction 
(one in p components) is of the pth order of smallness. But the justification for this assump- 
tion when some of the x, are qualitative and not quantitative (and it is frequently made) 
must be found in considerations outside the data which the experiment provides, in common- 
sense or philosophical grounds. 

The grand mean M = Ly(x,, Zp, ...,X,)/2" where the summation is over all possible sets of 
values of the components. In the jth assembly of an actual experiment, some components 
will be at nominal and some at their extreme values. If the true value of the assembly 
characteristic is then y,, it is found on solving the above equations that 


Yj = M + gM, + AygMmg + ... +A jq My +O; ny 1 Mag t --- +45 anMags.n)s (1) 
where the coefficient of m, is + 1 according as the ith component is at extreme or nominal in 
the jth assembly; the coefficient of m,5_,)is + 1 according as the number of plus ones among 
the coefficients of m,m, ... m, is odd or even. In doing this the signs of the odd-order inter- 
actions (involving an even number of factors) have been reversed, but the notation is con- 
venient, for then the coefficients in y, are all minus one. It is assumed that y, is always one 
of the selected assemblies, and this is no real restriction upon the design. 


3. LEAST SQUARES AND PRECISION 


The purpose of the experiment is to estimate those of the quantities m as may not be assumed 
negligible from a set of measurements 1,12, ...,7. For this we must solve a set of N linear 
equations represented by (1). The equations always involve M, and therefore to estimate ¢ 
of the m’s it is necessary to make at least (q+ 1) measurements. If exactly (q+ 1) assemblies 
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are measured, there is a unique set of m’s satisfying the equations; if more than (¢+ 1) are 
measured there will be no unique solution and the best estimates are, as is well known, 
obtained by the method of least squares. This obtains the set of m’s which minimizes 


S = L (r;—M +a, m, —a;.m, ...—a;,mM, ...)*, 
3 


where r; is the measurement in the jth assembly whose true value is y;. Normally (q+ 1) is 
much less than 2”, all high-orderinteractions being neglected, so that the number of assemblies 
N may be made much smaller than for the complete factorial design. 

As already stated, the greater the number of assemblies measured, the greater the precision 
with which component effects may be estimated. On account of errors of measurement and 
the neglect of certain effects the minimum S, of S is not zero. In fact S,/(N —q—1) provides 
an unbiased estimate s* of o*, the variance of error of each measurement (assumed 
the same for all assemblies). The error variance in the estimation of an effect m,; 
is of the form o*/t;, where ¢; is called the precision constant. It depends only on the design 
of the experiment, and can be increased indefinitely by increasing N. Our object is to find 
designs which maximize all the ¢; simultaneously for given N. They will be called optimum 
designs. The ratio of m, to s/,/t; has a t-distribution on the null hypothesis—that the true 
value of m, is zero. The effect of increasing the precision is, first, to increase the power of the 
t-test in detecting any departure of m; from zero; secondly, to increase the accuracy of its 
estimation. In the designs given at the end of this paper, for L = 2 all main effects may be 
estimated with maximum precision N (given N assemblies), that is, the standard error of 
m,; = a/./N provided N isa multiple of 4. In cases where N = 2° certain interactions may also 
be estimated with the maximum precision. The choice of N (subject to W >q + 1) will depend 
on the extent to which the experimenter wishes to minimize the effect of his experimental 
error. 

4. REQUIREMENTS FOR OPTIMUM DESIGNS (ANY L) 

I. Consider now the case of n components each of which may take L values. If inter- 
actions are neglected, the true values y; may be assumed linear functions of certain constants 
representing the main effects, as was proved rigorously for the case L = 2. In general let 
xq) Tepresent the effect due to the jth component at its /th value. The true value of the 
measurement on the ith assembly is 

¥%: = Dr%w ¢= 1,2, ...,N i, 
' j=1,2,...,0 
t= 1,2,...,2 
where / represents the value at which the jth component appears in the ith assembly. We 
now introduce certain new variables in terms of which to express the x, as the primary 
interest isin the change of assembly characteristic caused by certain changes in the components. 

Let Q be a non-singular Z x L matrix whose first column consists entirely of ones, such 
that Q = OD, where O is orthogonal and D diagonal. The condition on the first column of 
Q implies that d,, = JL. 

Let U=[4]=@7 ty) = Q'X,, 


Ue X(2) 


Uy, Tun). 
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Uy = (Xyq+ Ty + --. +2%yqy)/L = mean effect of component 1 and we, us, ..., uz are constants 
which determine the effect of changes in this component upon the assembly characteristic. 
The orthogonality property will be used later. Therefore 


X, = QU =f u, + @,ytg +4,3Uy +... +0, Uz 


> 
Uy + Agog +Aogtg +... + Agr Uz 
Uy +OypoUgt ApgUgt ... +Ar7 Uy 


where the a@,; are certain constants. Similarly, introduce variables 1%, v, vs, 


... 0, for com- 
ponent 2 and write: 
Xs = 4) => Vy + Qy2Ve + 443V3 + eee +4170, > 
Xx2) Vy + gqVq + AogVg +... +o Vz 
Vr) Vy + QyeVet+Gz3U3gt+ ... + Azz 


where v, is the mean effect of component 2. And so on. Hence 
Y=f[y,]=AX, where X= sai 


Ye Us 


uU 
i 
YN 








M = u,+,+... ton terms. 


A is a matrix with N rows and n(Z—1)+ 1 columns, the first column consisting of ones, 
and the remainder onsisting of the elements a,;; belonging to Q. The columns fall into sets 
(corresponding to the components) of (Z— 1) after the first, and the rows of the submatrix 
formed by such a set consist of repetitions of the rows of Q. At this stage renumber the 
suffices of the a;; so that A may be written (a,,). 

II. The vector Y is known. Solving the equations by least squares (assuming 
N>n(L—1)+1) gives the so-called normal equations A’Y = A’AX = CX say, ie. 
X = C-1A4'Y. Ifo? is the error variance of a single observation y, it is proved in text-books 


that ; 
. var (0,) = | Ci, | o?/| C |, 


where 0, is the kth element of X and C,, the cofactor of c,, in C = [c,;], C being a symmetric 


nxn matrix. It is required to minimize | C,, |/| C |, ie. to maximize ¢ = | C |/| C,,| by suit- 
able choice of design. 


Write ¢;;/c},c}; = r,, and the matrix R = (r,;], where r,, = 1 and r,; =r 


jt 
2 \* 2 \! 
Now ys = L4,4,4,; (xa3,) (za3,) . 
r r r 


If (@y;,@gj, ...,@y4) and (a;,9;,...,@y;) be interpreted as the co-ordinates of two points 
P, and P; in a Euclidean N-space, then r;; = cos POP, where O is the origin and hence r?, < 1 


) 


*com- 
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Now t = | R| 8,/| R,,|, where S, = Ya?, and so S, must be fixed otherwise ¢ may be 


increased indefinitely. This is equivalent to fixing the scale of measurement, the preceding 
section having dealt with the choice of origin at the mean. Eliminate the pth row and 
column from | R | and | R,, |-by pivotal condensation: multiply the pth column by r,; (p+) 
and subtract from the jth column for all j +p. A row of zeros appears in the pth row except 
in the diagonal place where there is a one. The determinants have been reduced in ordtr by 
one, and the second is still a principal minor of the first. 

Thus 


= | (1 —r3,) (1 —Tip)* Tij.p | 


defining r,,;,, in this manner, where the suffices appearing after the dot represent columns 
that have been eliminated. 


Therefore taking out factors from rows and columns 
riso| | at (1 _73,)| (res.plax| 

\f i+p,k | 

= S,(1—rkp) | Tiz.n |/| [rez vax |- 
Now Tij.p = (cos P, OP; —cos P, OP, cos P; OP,)/sin P, OP, sin P;OP,, 

which is the formula for the cosine of the projection of angie P;OP; on to the (N — 1)-space 
orthogonal to OP,. Therefore 

ry y<l (i+j) and r= 1. 
The method has obtained a ratio of two determinants of the same type as before, and the 
process is repeated, step by step, until that in the numerator is of the form 





t+p 


\ 


1 Tak. pipe 
"ak.piP2 1 
and the denominator is 1. Row and column 7), pg, ..., are eliminated in turn (no p being 


equal to k), and so 


c= §,(1 —Tp,) (1 at Tiewm,) (1 i? (— ght (1 a 

This is a maximum only when r,,, = 0 for all p +k and all k. For equal precision S, must 
be constant for all k and t = S,. Therefore A’A = C = tJ. Hence the designs for which the 
maximum precision is attained are those which correspond to columns of an orthogonal 
matrix (apart from an arbitrary multiplier). 

At this point it is convenient to prove the formula for the error variance. Let A be the 
non-square matrix with orthogonal columns of the equations: Y = AX. Introduce further 
columns U so that (A, U) is a square orthogonal matrix, and corresponding dummy variables 
whose column vector is Z. The least squares solution of the above equations is X, given by 
A'AX, = A’Y, therefore 1 
tIX,=A'Y, X,= ,A'Y. (1) 


The equations [A, U] B = Y have a unique solution, and on multiplying by od , 
tI at = rt 
Z U'Y 
so the resulting value of X = X, as before. The residual vector EH = Y—AX, = UZ. 
Sum of squares of residuals = E’E = Z'U'UZ = tZ'Z = tz}. (2) 
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III. Consider now any pair of components f and g. Suppose they appear together at 
their [th and /’th values respectively wy times. This defines an L x L matrix W = [w,']. 
The scalar product of a column of A belonging to f by a column of A belonging to g is zero 
by the orthogonality of A. Let these correspond to the uth and vth columns of Q and revert 
to the old suffices of a,; corresponding to Q, i.e. Q = [a,;], where i,j = 1, 2,..., L. 

Then @,,,.wWy .@y, in dummy suffices equals 


QVwWe=([N 0 0... 90 
ihe, BE eT 
Pig a a iy, 


The N appears because the first column for f is the same as the first column for g, equal to 
the first column of A which consists entirely of ones. 


Now Q = OD where O is orthogonal and D diagonal, therefore Q’WQ = DO’ WOD, ie. 
O’'WO=D4TN 01D" 


0 0 

= [N/d?, 0]. 

[od 

Therefore 
W =00'WOO' = 7 1/,J/L c N/L ODP i/VL iL... Afb 
1/,/L other terms other terms 
1/JL 0 0 

=[N/L? N/L*? ... N/L*]. 

N/L? N/L? ... N/L 

N/L? N/L? N/L? 


Sum of terms in /th row is the number of replications of the Ith value of f. Therefore 

(i) Each component is replicated at each of its values the same number of times. 

(ii) Each pair of components occur together at every combination of values the same 
number of times. 

(iii) The number of assemblies is divisible by the square of the number of values. 

The converse—that under these conditions the matrix A is orthogonal—can be proved by 
reversing these steps. The actual matrix Q chosen is unimportant, and the design can be 
specified by means of a rectangular array with N rows and n columns containing L different 
letters (a,b,c, ...,k) representing the ZL values of each component. The problem is then a 
purely combinatorial one. If N = K L*, the maximum number of columns n is 


(KL*—1)/(L—1) 


or its integral part since KL? >n(Z—1)+1. We propose to call designs of this type multi- 
factoriai designs. 
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Returning to the case of L = 2, it is necessary to obtain an orthogonal 4K x 4K matrix 
A whose first column consists entirely of ones. Choosing Q = » es i the other columns 


of A consist of equal numbers of + 1 and — 1. The signs may be changed down the length of 
certain columns without spoiling the design so that the first row apart from the corner element 
consists entirely of — 1. Then apart from this row (in future called the basic row) and the 
first column the design consists of a square matrix with 2K plus and (2K — 1) minus ones in 
each column and (by orthogonality) row, and such that each pair of columns contains a pair 
of plus ones in the same row K times. The estimates of component effects are obtained from 
equation (1) of § 4 (II): 1 

X,= mW 
Thus they can be evaluated by addition and subtraction with only one division. This 
simplicity appears in the illustrative example given in §§ 9 and 10. The dummy variables z; 
are similarly evaluated and the estimated error variance is 


a 
' -ag-] 


A'Y (here ¢ = 4K). 


gs? 


zd. 


5. METHODS OF SOLUTION 
Certain methods of constructing orthogonal matrices with elements plus or minus one are 
known (Paley, 1933). They depend upon the theory of finite fields, ap outline of which will 
now be given. 

A field F is defined as a set of quantities which is closed with respect to two operations, 
addition and multiplication (i.e. if a, b in F, so are a+6,ab). These quantities satisfy the 
following laws: 

(i) a+6 = b+a. (ii) a+ (b+c) = (a+6)+e. (iii) a(b+c) = ad+ac. 
ab = ba. a(bc) = (ab) c. 
(iv) There is an x such that a+ = 6 for every a, b. 
From these it may be proved: (a) There is a unique quantity 0 such that a+ 0 = a for all a. 
(6) The quantity z in (iv) is unique. (c) a.0 = 0. Finally, we add 
(v) There is a y such that ay = 6 for every 5, all a+0, to our axioms. 


Hence as before: (d) There is a unique quantity 1 such that a.1 = a for alla. (e) There isa 
unique quantity a— such thata.a—! = 1 (a +0). (f) The quantity y in (v) is unique. y = a~"b. 

Consider the integers 0,1,2,...,(@—1) where p is prime, and write a = 6 if (a—b) is 
divisible by p. Then this set of integers forms a finite field as may be easily shown. For 
example, when p = 5, the numbers in the field are 0, 1, 2, 3, 4. 


24+4=6=1, 24+3=5=0, 2.8=4.4=1. 
Hence 2 and 3 are reciprocals and 4 is its own reciprocal. This field is called the Galois field of 
order p, GF(p). 

Now suppose z is a number algebraic over GF('p), that is, x satisfies an algebraic equation 
with coefficients in GF(p). Then it defines an algebraic extension of GF(p), namely, all 
polynomials in x with coefficients in GF(p). If x satisfies an equation irreducible in GF(p) 
and of degree n, there are p” distinct polynomials in x. They are of the form 


f(%) = 9+ q,x+...+4,_,2"- (a, a), ...,@,_, in GF(p)). 
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Such an algebraic extension is, in fact, a field. Moreover, all fields of degree n over GF(p) 
may be shown to be equivalent. There is a member of the extended field « such that 
1, a, a2, a3, ...,a%-* (q = p”) constitute the non-zero elements of the field and a? = 1; an 
extension of Fermat’s theorem. Any one of these equivalent fields is denoted by GF(p"). 
We shall now require various simple theorems. 


Der. Ifa non-zero element of a finite field is a perfect square (a = b?) it is called a quadratic 
residue of the field. All other non-zero elements are non-Q.R.’s. 

Tu. 1. The numbers of Q.R.’s and non-q.R.’s are equal (p> 2). 

For every 6 = a*, a = b? = a = g-AG-1)_ (J integral). 


But (q—1) is even (p> 2). Hence only even powers of a are Q.R.’s. 
Therefore there are $(q—1) Q.R.’s. and 4(q— 1) non-q.R.’s. 
We now define the Legendre function x(a): 

x(0) = 0, 

x(a) =+1 whenaisaa.R. 


=-—1l1 whenaisa non-Q.R. 


Th. 1 states that > x(a) = 0-(summation over whole field). 
a 


Ta. 2. x(a) x(6) = x(a6). 

This is trivial when a = 0 or b= 0. Otherwise a = a“,b = a” and ab = a“*+” is a QR. if 
and only if (w+) is even, i.e. u and v of same parity. This proves the result. 

Ta. 3. x(-1) =+1lifg¢ = 4+1 


=—-lifg=4t-1 
For at! = +1. 


for integral ¢. 


Therefore a#¢-) = + 1 = —1 since powers of « are distinct up to a?-". 
Hence —1 is a Q.R. if and only if }(q—1) is even = 2¢ and q = 4t+1. 


TH. 4. } x(j-—4)x(j —%2) = — 1 (summation over all j in GF(p"); p> 2; i, +%,), 
j 
UxXG—t) Xa) - =xG-4) (j-%,)} by Th. 2. 


Put oa jt) -— awe 


pte. SeReT Ets 

Expression = > x(u? — 2) (j is summed over whole field so u will be also). 

uu 
Putw=u,v (u)+0). 
Expression. = >) y{u3(v? — 1)} (u is summed over whole field so v will be also) 

v 

= Dx(us) x(v?—-1) = Dx(v*—-1) by Th. 2. 

J v 
Now if v?—1 = 2%, y?—z? = 1, 
(v—xz)(v+z) =1. 
Ifv+2=y, v—-e=y". 
Therefore v=Hyt+y"), «= }y-y"). 


Hence the number of values of v for which y(v?— 1) = + 1 or 0 is the number of values of v 
for which v = }(y+y-). 


Ee 


F(p) 
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Now if ytyt=wt+wl, ywiw=yu'ty, (y—w)(l—yw) = 0. 
Therefore w=y or y-!.y and y~ are distinct unless y = +1, v = +1 when 
x(v®—1) = x(0) = 0. 
Hence to every one of the }(q— 1) reciprocal pairs (y, y-!) corresponds a distinct value of v. 
Thus there are }(q—3) values of v for which x(v? — 1) = +1 (excluding v = +1). 
There are two values of v for which y(v?— 1) = 0 (v = +1). 
Hence there are 4(q— 1) values of v for which y(v?— 1) = —1. 


Therefore UxX(9- 41) XG—%) = UX(w?-1) =—-1. 
j » 
Applications 


I. Consider the matrix A = (a,;) (i,j = 0,1, 2, ...,p) of order (p+ 1), where p = 4t—1. 
Gig = A; = +1, 
ay = x(j-i) (60, j+0, i+9), 

=—l, 

The scalar product of Ist and (i+ 1)th rows 


Gis 
P . . 
= Ag Fin + My 4 45 + ~ x(9 - 4) 
j= 


=1-14+0=0 (Th. 1). 
Scalar product of (i, + 1)th and (7,+ 1)th rows 
Pp 
= Bj, 9 Fino + %,4, Vine, $ Gigi, V2, + 2x0 —%) X(j — te) 
Fa 
= 1— x(t; —%)—X(t2—-4,)-—1 (Th. 4) 
= 0 since p = 4t—1 (Th. 3). 
Hence the matrix A is orthogonal. 


II. To construct A of order p" +1 = 4t, we associate the rows and columns (except the 
first) with the elements of GF(p”) and the proof runs exactly as before. 


III. If A is orthogonal ie = a) is also orthogonal and has double the order of A. 


Hence an orthogonal matrix A of order 2"(p" +1) (where p” = 4t—1) or 2" can be con- 
structed by successive doubling. 


IV. If p" = 4t+1, (p"+1) is not divisible by 4. 
But an A of order 2(p"+ 1) can be obtained by a slight modification of the method. 
Consider the matrix B = (b,;) (i,j = 0,1, 2, ...,p") of order (p"+ 1) [p" = 4¢+ 1). 
bi; = X(uj—U,) (+0, 7 +0) 
where u; is the element of GF(p") associated with the (¢+1)th row and column of B, 
‘= aS wy, b ‘ 
00 = 0. 


Scalar product of Ist and (¢+ 1)th rows 


2p" 
= bo bin + 2 x(uj— m4) = 0 (Th. 1). 
j= 
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Scalar product of (i, + 1)th and (i, + 1)th rows 
= Bj.95;,0 + 95,5, 844, + O88, Oi, + E xty — U;,) X(U;j — U;,) 
=1-1=0 (Th. 4). 
Thus B is orthogonal. 


Now replace +1 by the submatrix C= eed i| : 


Lak os 
—1 by the submatri enic +e 
y atrix — af ERS) 





: r+1 —-1 The new matrix A thus formed 
0 by the submatrix D= ge i . is Of order 29" + 1). 


Consider the scalar products of the (2%,+1)th and (2i,+2)th rows with the (2i,+ 1)th 
and (2%, + 2)th rows. This is a (2 x 2) matrix M; 


i,i2* 
n 


Now Mi, = > (6;,5C) (6;,;C°) + (D) (6;,4,C') + (6:,4,CE)(D’) [GF 4, 9 #42] 


j=0 
p" 
= CO" SY 5y,554,5+ X(ui,—u,)(DO'+ CD’) [ft 7 #42] 
j=0 
(by Th. 2 and Th. 3 for p" = 4¢+ 1) 
= 2 
= CC 0 + x(u;, — u;,) E 0 


p" ° 
since > 5, ;b;,; = 0 (orthogonality of B) and the omitted terms vanish 
j=0 


_fo 0 
~ Lo oy 
Finally, it is clear that the (21, + 1)th and (2%, + 2)th rows are orthogonal to each other. 
Hence A is orthogonal and of the type required. 


Thus by successive doubling we may obtain matrices of order 2"(p" + 1) where p” = 4¢+ 1. 
V. Summing up we have: 


If N = 2"(p"+1) = 4K, where p is an odd prime or zero, an orthogonal matrix A can be 
constructed with plus and minus ones. 

The matrices constructible by these methods include all values of N = 4K up to 100 
excepting 92. Those of order 2’ are structurally the same as the complete factorial design in 
r factors if they have been obtained by successive doubling. These will be called geometrical 
designs because of their close connexion with finite geometries. It is clear that if two columns 
of the design represent main effects, and if the interaction column corresponding to them in 
the complete factorial case is a dummy in the actual experiment, it may be used to estimate 
the interaction. The condition for a column to be the interaction between p other columns 
is that it is + D, where D is a column vector whose ith element is the product of the ith: 
elements of the original p columns. So far interaction columns have only been found in 
the geometrical designs and in them every interaction between an arbitrary set of 
columns is a column of the design. It must also be mentioned that the cyclic designs for 
N = 2’ obtainable by the method of § 8 depending on GF(2") are in fact merely permutations 
of the geometric designs. They are the forms used in the tables for convenience. 


med 
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ee 
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6. CASE OF MORE THAN TWO LEVELS 


We now provide experimental designs for determining component effects with maximum 
precision when the number of values L is greater than 2. These solutions cover the cases 
where the number of assemblies N = I’, L being a prime or a power of a prime and r any 
positive integer. Two methods are given: in the first, successive columns of the design are 
formed by simple operations on the preceding columns; in the second, which is of more 
limited application, the design is specified by one column, all others being cyclic permutations 
of this. 
7. MODIFIED FACTORIAL DESIGNS 


The methods given in this section for the construction of multifactorial designs, although 
discovered independently, are nevertheless identical with those used by Bose & Kishen 
(1940) to express the generalized interaction for the purpose of confounding certain contrasts 
with block differences in agricultural experiments. They construct their interactions 
directly from finite projective geometries without using, as we have done, the intermediate 
device of orthogonal sets of Latin syuares. We shall, however, describe these methods, as 
they may not be familiar to experimenters in this country, especially not in the way in which 
we propose using them. 

Suppose a complete factorial experiment is carried out for r factors each at L levels (i.e. 
in this case r components measured at L values) so that 7 assemblies are made. Let the 
levels be called 0, 1, 2, ...,(Z—1). Then the r main effects define r columns of a design array 
(with Z* rows) containing these L symbols. Each symboi appears the same number of times 
in a column as any other. Each combination of symbols for two columns occurs equally 
frequently. We shall apply the term orthogonal to such a pair of columns. Now let A, B be 
two orthogonal columns. The interaction AB has (L— 1)? degrees of freedom. Since each 
column of the array is associated with (I — 1) degrees of freedom, a first-order interaction is 
represented by a set of (Z—1) columns which will be called the terms of this interaction. 
Similarly, an interaction of the mth order (i.e. involving m+ 1 factors) is represented by 
(Z—1)™ columns of the design array. 

Now an interaction between two factors is most naturally defined by the following 
conditions: 

(a) Each combination of levels of A and B corresponds to only one level within each 
term of AB. 

(b) The terms of AB are orthogonal to A and B and to one another. 

Condition (a) means that if, for instance, in one assembly level 2 of A and level 5 of B 
occur together, and if a term of AB is defined to appear at level 3 in this case, then whenever 
A and B occur at these levels together again, this term of A B appears at level 3. Since, owing 
to the complete factorial basis of the design, every combination of levels of ABC occurs 
equally often, each combination of levels of A and B occurs equally often with every level 
of C. But such a combination of levels of A and B fixes the level in a term of A B by condition 
(a). Hence each level in a term of AB occurs equally often with every level of C. In other 
words, each interaction term will be orthogonal to the main effects not connected with it. 

Now for condition (6). If the rows and columns of a square Lx L array correspond 
respectively to the L levels of A and B, each cell may be filled up with the level appropriate 
to a particular interaction term. For any particular term of AB such a square will be Latin, 
because, regarding a row, the level of A is fixed; all the levels of the interaction term must 
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occur equally often with this level of A and hence each symbol appears once in every row of 
the square; similarly it appears once in every column in order that the interaction term may 
be orthogonal to B. Finally, superimposing the Latin squares for two terms of the same inter- 
action, each symbol belonging to the first term must appear once in the same cell with each 
symbol of the second term, in order that these two may be orthogonal: thus if conditions 
(a) and (b) are satisfied the interaction terms are founded upon a completely orthogonal set 
of Latin squares. 

It remains to show that the terms from two different interactions are orthogonal. This 
follows because every combination of levels of four factors A BCD occurs equally frequently, 
i.e. each combination of levels of A and B occurs equally often with every combination of 
levels of C and D. The former correspond to levels in the terms of A B; the latter to levels in 
terms of CD. Hence a term of A B is orthogonal to a term of CD. Similarly AB and AC may 
be dealt with. This shows that the first order interaction terms may be joined to the main 
factors as part of the balanced design. Higher order interactions may be regarded as first 
order interactions between those of lower order, e.g. (ABC) = (AB) (C), it being under- 
stood that (Z — 1) terms are derived from each term of (A B) taken with the factor C, so that 
in this case there will be (Z—1)? terms for the second order interaction. This procedure 
builds up the design by an inductive process, and when the interaction of the (r — 1)th order 
has been obtained, it will be complete. The total degrees of freedom in the original factorial 
design = L’—1. Hence the number of factors that may be measured if interactions are 
neglected L-l 

~ bl" 
It may be remarked that there is nothing new in this treatment of the complete factorial 


design except the modification of the usual Fisher interactions so that they may be placed 
on the same footing as main effects. 





If L is a prime number, cyclic Latin squares exist forming an orthogonal set: each square 
is obtained by writing the first row of symbolsin standard order, successive rows being obtained 
by shifting the symbols along p places from each row to the next (p = 1,2,...,(Z—1)). 
In this case the appropriate column of the design is formed as follows: assuming the first 
row in the order 0, 1, 2, ...,(—1), if x level of A and y level of B occur together, the corre- 
sponding level for this interaction term is (y+ px), the symbols being reduced with modulus 
L. The squares for p = 1,2, ...,(Z/—1), give all the interaction terms. 

When L is the nth power of a prime p, then we associate the L levels of a factor with the 
elements of a Galois field, GF(p"). Suppose these elements to be 2, U1, Ug, ..., Uz_, Where 
U, is the zero and u, the unity of the field. If then u, level of 4 and u, level of B occur 
together, the corresponding levels for interaction terms are u,+u,u,, and the squares for 
U, = Uy, Ug, +..,Uz_, give all the terms present. The method of constructing completely 
orthogonal sets of Latin squares from Galois fields is given in Stevens (1939). 

For L = 6 it is known that no pair of orthogonal Latin squares exists so it is not amenable 
to this treatment. The design for N = 9, L = 3 is given below; the accompanying key refers 
to the column vectors and the rows are labelled as if belonging to a complete factorial design. 


A 


B (AB), (AB), AB (AB), (AB), AB (AB), (AB) 
a,b, 0 0 0 0 a,b, 1 0 1 1 a3b, 2 0 2 2 
a,b, 0 1 1 2 a,b, 1 1 2 0 db, 2 1 0 1 
a,b, 0 2 2 1 a,b, 1 2 0 2 a3bs 2 2 1 0 


Key: (AB),=(A)+(B), (AB), =(A)+2(B). 





RT | 
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For r = 4, N = 81, if the main effects are taken as A, B, C, D, (ABCD),, then all first order 
interactions are determinable. 


8. CYCLIC SOLUTIONS WHEN L Is A PRIME NUMBER 


We shall again be concerned in this section with Galois fields: each element of GF(p") will 
be represented by a set of m ordered numbers, each number being 0,1, 2,...,(p—1) 
where p is prime. Consider the block B, of elements which have the integer r in position s 
(r = 0,1, 2,...,p—1; s = 1, 2,3, ..., 2). 

For example, in G F(3?) the block having 2 in position 1 is 20,21, 22. We require to show that 
if the elements of this block are multiplied in succession by any element of the field other 
than 000 ... 0¢ (1 <é#<p—1), then the elements of the resulting block B, have equal numbers 
of all possible r in position s. There are in fact p”-! elements in B,, and we need to prove 
that B, is subdivisible into 


p” elements having 0 in position s 


p” elements having | in position s 


p”~ elements having p-— 1 in position s. 


There is no loss in generality if we consider s = 1, i.e. we refer now to the first members of all 
elements of the field. Take now. all the elements having 1 in this position. Multiply this 
block A, by any element 6 of the field and obtain block A,. Suppose in A, that r, first members 


p-1 
are 0, r, first members are 1,..., and r,_, first members are p—1. Clearly Yr; = p""". 
0 


Case 1. 17 ,1,,..-,7p_, all +0. 

Form the complete block C, (first member 0) by subtracting one element of A, from all 
other elements of A,. If C, is multiplied by 6 we get the block C, formed also by subtracting 
one element of A, from all other elements of A,. We can form the elements of C, (first 
member 0) by subtracting one of the r, elements of A, (first member 0) from itself and all 
the other r,—1 such elements. Hence there are r, elements of C, (first member 0). 

We can also form the elements of (, (first member 0) by subtracting one of the r, elements 
of A, (first member 1) from itself and all the other r,— 1 such elements. Hence there are r, 
elements of C, (first member 0). 

Hence rg = 7, = 2 = ... = Tp_, = Pp". 

This result must be true for all other first members, since all elements of the field are 
obtainable from those in A, by addition or subtraction. 


Case 2. One or more of 79,1, 19; ---;%)-1 = 9. 


Suppose in fact that r,,7;,...,7,+0. Exactly as above, we can show r; = 7; = ... = Tx. 
This leads to a contradiction since p”—' is not divisible by a number less than p, unless we have 


Case 3. All except one of rg, 7), ..., 7); = 0. 

Suppose that the first members of all elements in A, are w, where 1<w<p—1. They 
cannot all be 0 since we can generate the whole field by addition and subtraction among the 
elements of A, and therefore the same among the elements of A,. This would lead to all first 
members being 0 which is a contradiction unless b is the zero of the field. 

Biometrika 33 20 
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Now we can find m such that mw= 1 (mod p). Hence mb (i.e. b+6+...+6) times A, gives 
a block all of whose first members are also 1 (block D). Therefore multiplying by (mb)-* 
gives a block all of whose first members are 1 (block #). Subtract block D from block E 
(i.e. ith element from ith element) and obtain a block all of whose first members are 0. This 
must lead as before to first members of all blocks being 0. Hence the multiplier 

(mb) — (mb) = 00... 0, 

therefore (mb) = (mb)-", i.e. mb = + the unity of the field. 

Hence the only possible 6 for Case 3 are 00... 0¢ where 1 <t<p-—1. 

Write now the first members of the field elements in the order generated by c, a primitive 
root of the field, and its powers, i.e. 

00...00, 00...01, c, c®, ...,0e** (CP* = 1). 

If these elements are multiplied in turn by c,c®, ..., we obtain a cyclic permutation on all 
elements other than the zero, and by the above theorem any pair of columns satisfies the 
required symmetrical property. Multiplication by c“?"-)®-) will multiply columns by 
00...0¢, where ¢ takes the values 1, 2, ...,p — 1, and hence the required property is satisfied 
only by the powers c, c?, ..., c(®"-DP-D), 

The proof that c“#"-Die-) = 00...0t is as follows: 

The element 00...0¢ is expressible in the form C*, therefore 

c7™P-) = (00... 0#)? 2=1= c?"*"1= cuwr"-), 

Therefore x(p—1) = u(p"—1). 

For example, the elements of G F(3*) written in the order generated by powers of a primi- 
tive root are J 

00,.Ol,. 2a, SQ -F,- GR, (88,58, 48. 

Taking the first members, we obtain a cyclic solution for N = 9, L = 3: 


me DONNDK OO 
Orr NONNE OS 
-OFKKDONNO 


wueKOrFfrKNON CO 


Thus, from the field GF(p") we may obtain a cyclic solution for the case L = p, N = p”. 
In the table of designs given below the first column of a cyclic solution is provided corre- 
sponding to the Galois fields 2*, 25, 3?, 38, 34, 52, 5% and 72. These have been taken from 


the tables in Stevens (1939), forming the basis of a series of completely orthogonal sets of 
cyclic Latin squares. 


9. EXPERIMENTAL PROCEDURE 


Suppose that the investigator is presented with an assembly containing 9 components and 
the problem of determining the effect of each of these in the performance of the whole. He 


decides upon an experiment in which each component appears at two values throughout - 


and main effects are determined with a precision four times as great as that with which an 
assembly can be measured; in other words, the appropriate design is that for L = 2, N = 16. 
‘On referring to the table below he finds the design represented symbolically as follows (an 
explanation appears in a few lines): 


++++—4+-4+4+--4--- 
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The complete design is generated by taking this as the first column (or row), shifting it 
cyclically one place fourteen times and adding a final row of minus signs, thus: 


+——-+—--+4+-4+-4+4++4+ 
+4——-—+--+4+-4+-44+ 
++4+—-—--4+--+4+-4-4+ 
++4+4+—---+--+4+-4- 
—+4+4+4+---+--+4+-+ 
+—-+4+4+4---4+--4+4- 
—+—-+4+4+4---+--4+4+ 
+—+—-+4+4+4---4+--+ 
+4+—-4+-4+4+44---4-- 
—++—-+-+4+4+4---+4+- 
—-+4+-4+-4+4+4+4+---+ 
+—--+4—-4-4F4+4+4=--- 
—+—--+4+-4+-4+4+4+4-- 
—-+-—-+4+-4+-+4+44- 
——-+--+4+-4+-4+44+ 


The rows of this design may be taken as referring to assemblies and the columns to com- 


ponents. In the case in point there are nine components so that only nine columns are 
required. Select any nine columns, say the first nine, and obtain: 


Components 

1234656789 
Assembly 1 t+##---+t+-- + + 
2 ++—---+--+ 
3 ++ 4—---+-- 
4 t++¢e¢—---¢- 
5 —-++++---+ 
6 +—-++4+4--- 
7 —-+-+++4+-- 
8 +—-+—-+++4+- 
9 t++e—-t-—-t+4+ 44+ 
10 -++-+-+4++4 
ll --++-+4+-++4+ 
12 +--++-+-+ 
13 -+--++-#- 
14 --+--++-+ 
15 —---+--++4+- 
Ww ---+---+--+--+--+-- 


The components have been labelled 1, 2, ...,8,9: a plus corresponding to component 7 in 
assembly 3 means that in that assembly component 7 appears at its extreme value; a minus 
corresponding to component 3 in assembly 12 means that in that assembly component 3 
appears at its nominal value; and similarly. It will be seen that each component appears 
eight times at an extreme value and eight times at nominal, so that the arrangement is 
perfectly symmetrical. The investigator now proceeds to set up assemblies according to this 
design, to measure whatever characteristic of them is in mind, and to record the results. 


10. ANALYSIS OF THE RESULTS 

The results are in the form: measurement on assembly 1 = r,, measurement on assembly 
2 = rg, ..., measurement on assembly 16 = 1,,. The effect of component 5, say, is required. 
Observe now that this component appears as plus in assemblies 1, 5, 6, 7, 8, 10, 12 and 13; 
and as minus in assemblies 2, 3, 4, 9, 11, 14, 15 and 16. Then the best estimate m, of the 
contribution of component 5 to the assembly characteristic due to its shift in value is 

Ms = (TytTs+1etT, +%et Tro tT +113 —%2—%3—Me—To— M1 — Tra Tis —"1e)/16, 
all observations where component 5 appears as plus being taken positively and where it 
appears as minus being taken negatively, and the divisor being the number of assemblies 
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made up. A solution similar to this, and as simple, holds for all designs where ZL = 2, and the 
general method of which components to put in which assemblies and how to evaluate the 
effects should now be apparent. 

The results provide in addition an estimate of the experimental error, obtained as follows. 
Suppose that instead of 9 components, 15 had been used, laid out in accordance with the 


experimental design given above. Then m,, for example, would have been evaluated by 
the equation 


Mg = (Fo+Tyt rst %gtyetli3 tliat Mis—11—%3— 16-12-79 — V0 — "11 — "16)/ 16. 
In general, with n components, the quantities m,,,,, 19, ..., M4x_, can be evaluated from the 
equations (number of assemblies N = K L* = 4K here). Since there are just » components, 


these quantities should each be zero. In actual practice this will not be so due to experimental 
error. The variance due to error is estimated by the formula 

8? = 4K(m?2_,,+m2,.+...+mig_,)/(4K —n—1). 
Here s? = 16(m?, + m2, + ... +m?,)/6 and the error variance of m; = s} = s*/4K. This formula 
is, as proved above, equivalent to the usual sum of squares of residuals divided by the degrees 
ot freedom; degrees of freedom for error = (4K —1)—n. 

A correction is necessary here. It will not usually be possible to select components whose 
values are exactly at nominal or extreme. All components will in any case have to be mea- 
sured and the extent to which they differ from the aimed-at values will affect the values of 
m, and s?. Suppose that ‘nominal’ components are selected from a small range whose centre 
is the nominal value; and similarly at the extreme. For the ith component the difference in 
value between nominal and extreme is‘2¢;. If the component differs from the aimed-at 
value by c; and if 6; = c;/t;, then the equations we are solving, instead of being of the form 


5 = M +4;,m,+Gj.m,+ eee + Aj, My, 
where the coefficients a,; are + 1 or —1, are of the form 
15 = M + (5, +551) my + (Ajq + D5q) Mg + ... + (jy +45,) My, 


ie. R = (A+B) X where capital letters refer to the appropriate matrices. An approximate 
solution for X is obtained from R = AX, as above, and closer approximations may be 
obtained by iteration; a detailed treatment of the method is given in Lindley (1946). 
Standard statistical methods now apply in determining the significance of effects and of 
differences between effects; whether the tolerance on a certain component may be increased 
and what would happen to the assembly characteristics if this were done; whether it is 
advisable to reduce the tolerance on another because of the large-scale effect allowed by the 
existing tolerance; whether the design of the assembly is correct in the sense that if both ends 
of the tolerance range have been explored the results show that the nominal value of each 
component is in the optimum position: these questions, and many like them depending on 
particular circumstances, may now all be answered. Errors may of course in all cases be 
reduced by replication, but it is suggested that, in order to obtain the best selection from the 
set of all possible assemblies (the complete factorial experiment) and thus minimize the errors 
due to interactions between components (here neglected as small), a complete design should 
be chosen in preference to the repetition of a smaller one. This aspect must not be confused 
with the fact that certain designs are obtained from smaller ones by the process of doubling, 
which is an entirely different thing. The designs in the table below (pp. 323, 324) will apply 
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directly to any experiment requiring less than 100 assemblies; should larger designs be 
required, they may be constructed by the general methods given. 


11. RELATIONSHIP BETWEEN MULTIFACTORIAL AND 
BALANCED INCOMPLETE BLOCK DESIGNS 


We begin for convenience with the definition of a balanced incomplete block design (Fisher 
& Yates, 1943). In this, v varieties are placed in blocks of k experimental units (k being less 
than v) such that every two varieties occur together in the same number (A) of blocks; each 
variety appears r times in all and the number of blocks is b. Whence rv = bk and 

A = r(k—1)/(v—1). 

Consider any of the multifactorial designs for L = 2. Let the rows refer to blocks and the 
columns to varieties; and suppose that a plus sign represents the appearance of a variety in 
a block, a minus sign the non-appearance. For N = 4m we obtain a balanced incomplete 
block design with 6 = v = 4m—1, k = r = 2m and A = m; the complementary design has 
b=v=4m-1, k=r=2m-—1 and A= m-1. The proof follows immediately from the 
orthogonality of the columns of the multifactorial design. 

Now consider a complete multifactorial design F with N rows and L symbols; by complete 
we mean that the number of columns of F is (NW —1)/(Z—1). Referring to § 4, suppose that 
all the elements of the diagonal matrix D are equal to ,/L, so that Q’Q = L.I. Let the rows 
of Q refer to the symbols 0,1,2,:..,2—1. In the multifactorial design F using these Z 
symbols replace each by the corresponding row of Q, omitting the 1 contributed by the first 
column. Add a first column of ones to the resulting matrix and obtain matrix A. Clearly 
A’'A = N.I and hence AA’ = N.J. In any two rows of F a pair of unequal symbols in the 
same column contributes — 1 to the scalar product of the corresponding rows of A; a pair of 
equal symbols contributes +(Z—1). Supposing that in these two rows of F there are A 
pairs of equal symbols in the same column, and remembering the 1 at the beginning of each 
row of A, we have 

14+(Z—1)A-1[(N—-1)/(L—1)-—A] = 0, 
whence A = (N-L)/L(L—-1). 


Let the rows of F refer to varieties and let each column represent L blocks, one corre- 
sponding to each of the L different symbols. By the result of the previous paragraph every 
two varieties occur together in the same number of blocks. We therefore obtain a balanced 
incomplete block design with parameters: 

r=(N-1)(L-1); v=N; b=rL; kK=N/L; A=(N-L)/L(L-}). 
When L = 2, so that N = 4m, we can thus generate a large number of designs. When L > 2, 
we obtain balanced incomplete blocks with parameters 


r=14+ 254+ 2%+...4+ 2, 


v= LT, 
b= L+ 12+ 23+...4+ TD, 
k= D1, 


A=14 5+ 2%+...+ Lh, 
where L = p™, p a prime, and h> 1. 
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The balanced incomplete block designs formed from multifactorial designs, for which 
r=(N-1)(L-1); v=N; b=rL; k=N/L; A=(N-L)/L(L—-1); 


are in fact of a special kind and have been called by Bose (1942) affine resolvable. A balanced 
incomplete biock design is resolvable if we can separate the 6 blocks into r sets of n blocks 
each (6 = nr) such that each variety occurs once among the blocks of a given set; and if in 
addition either (i) b+ 1 = v+r or (ii) any two blocks belonging to different sets have the 
same number of varieties in common, then the other is true and the design is called affine 
resolvable because of its relation to certain finite Euclidean geometries. Bose has shown 
(1) Ifa resolvable balanced incomplete block design is such that any two blocks belonging 
to different sets have the same number of varieties in common, then 6+ 1 = v+r. 
(2) If for a resolvable balanced incomplete block design 6 + 1 = v+,r then any two blocks 
belonging to different sets have the same number of varieties in common. We have shown 
(3) Ifa resolvable incomplete block design (i.e. one with r, v, b, k given but not necessarily 
balanced in the sense that every two varieties occur together in the same number of blocks) 
has 6+ 1 = v+r and is such that any two blocks belonging to different sets have the same 
number of varieties in common, then it is balanced. 
To sum up, if a resolvable incomplete block design has any two of the following properties: 
(i) any two blocks belonging to different sets have the same number of varieties in common, 
(ii) balance, 
(iii) 6+ 1 = v+r, 
then it has the third. The orthogonal matrix method we have used to prove (3) can also be 
used to provide short proofs of (1) and (2). 
Consequently a multifactorial design can be formed from a balanced incomplete block 
design provided that the latter is resolvable with parameters 


r=(N-1)(L-1); v=N; b=rL; k=N/L; A=(N-L)/L(L-1). 
Bose has pointed out that affine resolvable designs can be constructed from the affine 
geometry EG(h, p™) (our notation) by taking varieties as points and blocks as (h — 1)-flats; 
this construction gives all the multifactorial designs for L > 2 which have so far been obtained. 
The most general aspect of the multifactorial design is obtained by considering each 
assembly as a block and each value of each component as a variety. We obtain a partially 
balanced incomplete block design (Bose & Nair, 1939) with parameters: 
r=N/L; v= L(N-1)/(L-1); 6=N; k=(N-1)/(L-1); 
Ay =N/L*; m=[(N-1)(L-1)-1]L; A=0; m= (L-1); 
_ fl —nL-1)-2)L (L-1)), oy | pe- W(L-1)-1]L Oo 
L (L—1) 0 Pi 0 (L—2) 


Although Bose & Nair state that general methods for the construction of partially balanced 
incomplete block designs are to appear, we have been unable to find them, so that this anges 
of the multifactorial design does not yield more solutions of the problem. 


12. SuMMARY 


Methods are developed to avoid the complete factorial experiment in industrial experi- 
mentation when the number of factors is so large that the standard procedure is impractic- 


| 
| 
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able. By assuming a simplified linear hypothesis, the problem of determining main effects 
with maximum precision is reduced to a combinatorial one. Practically all useful solutions 
of this have been found when each factor appears at two levels, but the solutions for more 
than two levels are fairly limited. The relationship of these solutions to some encountered 
in balanced incomplete blocks has been discussed. 
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We are indebted to Mr G. A. Barnard for suggesting the problem and the method of 
approach by least squares; and to Dr Bronowski for drawing our attention to the useful 
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sarily TABLE OF DESIGNS 
locks) A. Designs for L = 2. The first row of any cyclic design is given opposite N, the number of 
, same assemblies. As stated at the end of § 4 (IIT) the matrix A here consists of plus and minus ones: 
these are denoted below by plus and minus signs. There is always a final row of minus signs 
erties: —the basic assembly—to be added. In the designs for N = 28, 52, 76, 100 the square blocks 
nmon, are permuted cyclically amongst themselves; in the three latter cases the extra column has 
alternate signs throughout apart from the corner element. The larger designs are grouped 
; in fives for convenience. 
iso be 
N=8 +++-+-- 
block N=12. ++-+4++4+---+- 
N=16. +++4-t+-+4+--+--- 
| N=20. ++ --tt+4+4+-4¢-t----+t+t+- 
N=. ++444-+-+4+--+4+--+¢-4---- 
N =28. First nine rows 
affine ~ t+o-tt+tt—---|-+---+--4t]4+4+-4+-4+4+-4+ 
| te-tt4e—---|--t+4--4¢--|-4+444-4¢4- 
-flats ; -+++4+4+---|4+---+--4+-]4-+-4+4+-4+4+ 
: - st ttetl[—-- 4-4 ---4t]4-444-4-4 
ained. —-- tte ttt][+—---- +t -—-[+t--ttte— 
geach | ----t+¢tett][—-4+-4---4¢-|-4+44+-4+-44+ 
‘all +ee—---t+-4t]--+--4+-4-|4-4+4-444+- 
rtially ++e—---+4-|4--4----4]44-44--44+ 
tee----4+4]-4+--4+-4¢--|-4+4-444-+ 
N=32, ----+ -+-t+4+ t-t+4- --t4+4+ tH+--+ t-4+-- + 
N =36. (Obtained by trial) -+-++4+ +---+ +444 - +44 -- t----— H-t+-4+ tF--+H- 
} N =40. Double design for N = 20. 
N=44,. ++ --4+ -4+--4+ 4-44 HH4-- -H-4¢4 F---- -H--- HH-4- HH- 
* N=48. ++ +44 -t4+4¢4 --4+-4+ -tHtet—- -t--+ FH-tt—- --F-4+ -H4H-- --H¢-- 
’ N =652 
lanced Pe ee ee ee See ee ee ee ee ee ee ee ee ee ee ee ee ee ee ee 
aspect First +/-+-------- Hee esetle tee - se -l4e-—- +e --telteee+e+—---- 
} eleven -|+4+-4+-4+-4-4/4+--4+-4+4-4¢-|4+-4--4+4--4]4--4+4--44-|4+-4+-4--4-4 
rows +/---+------ teete--—--$t]—- Ft t—--tel[tete--+4+--|--t++t+e+e-- 
~|-+44-4-4-4/4-4+--4-44-|-4+4-4--4+4-|4-4--4+4--4¢])-4+4-4-4--4) 
+|----- Ha HF tt tt tele e- -t eee -l- te ee--tt]|--- tt tett 
\ m|-+- tte —t— tlt tt ttle ttt t-te t-te -let-tt-4+-4- 
+|------- $e -|-- Ft Htettte— Hl tee Ht te etlet— te te— alte -- - t+ teet 
xperi- ~|-+-4+-444-4/-4+4-4+-4--4]-4¢4--4+4-4-[4--t+4t-4t¢4|4--4+-44-4- 
: +|--------- mm HF ttle tem tte Ht tl- beet eeeleee+---- +t 
ractic- —|-+-4-4-444/-4-44-4-4¢-|4--+4--44-|-t4--44-4-|4-4--4+-44- 
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N =56. Double design for N = 28. 

N=60. ++-+4+ +-+-+ --+-- 444-4 F4+4-- +4444 ----- ++--- -+--- 

t+—-+4+ -t-4+- --+- | 

N =64. Double design for N = 32. 

N=68. ++--+ -4--+ 4---+ F44-4 -4F444 F4--4+ ---4t- FH4-4 F---- 

— Ft --- t+ HH-- + H—-F-F +- 

N=72. +++4++ +e—-t4+ +-+-- t+-t+4+ 4---+ +-4F-4+ +-4+-- -4H4H4- 4+--4F- 

to--tt $---+ --t+4+- +---+ ----- - 

N=7%. ++-4-4-4¢-4¢-4+-4¢-4+-4¢-4+-4+-4¢-4¢-4¢-4-4¢-4-4¢-4+-4-4-+4- | 
PS a Pe GS pp AR FOS Sgr ld FO A ig RR gn LI Fe Pa aed gt por: A ed 
ap ches Setbiss bes inches feud Madd din bats Abd Subd othad ins Gan; ches a Sale sede oe] ae Soy hee be { 

+-+¢-4¢-4¢-4-4¢-4-4¢-4-4-4-4-4-4-4- | 
++)/4+4]/4+4]--|--]--|--]+4+]--]+4+]+4]--]--]+4]-- 
+ -|+-|4+-|-4]-4+]-4]-4+]+-]-+]+-]+-|]-+]-4+]+-|-+ 
The first three rows are given; to obtain the complete design the square blocks are permuted 
cyclically. The first column, apart from the corner element, has alternate signs. | 
} 
N=80. ++4+-+4+ +--4+4+ +4 -4- -H-4¢4+ F444- F4--- -4H4-- -4-4-= 4-4-4 
Heo -t $He4—-- Fo ---— - HF - t+ FHF -HtH- -t-- ) 
N=84. ++ -+4 -=4+-4+ $+ 4—-- -+4H—--— —-H—-4-— HHEFHEH FH-F- -H4H4- F4--4+ ) 
— amr tt - $e HHH Ht HHH tHe F--t$He F---- HH HH -+- 

N =88. Double design for N = 44. . 

N =92. This design has not yet been obtained. ' 

N =96. Double design for N = 48. 

N = 100. 

Sle ele ete ele ee ee ee ee ee ee eee ee ee en en 

-+------------ Hee -tttte—- t+ —--[t¢---- te ee—-- ttle eeeee—--- + -- 44 

Hem tam tae tata te ttn mth tn ttn ttn tt tant mt talent +--+ - 4-444 

---+---------- —-tt+--+ttt—--ttlt+tee—----4+444--l4 4+4+4+444+4+------ |? 

ttt —t— tt —t— t-te —-— te —-F-- te —-[4—t-- 4-44 -4¢--4]/4 -4+-4-4--4-4-41| 

----- tam mm malt te Ht te Ht tee -l- Feet teetl-- +e eeeete—-- 4 

mtr ttt rtm tits tom tt tet tl ttt t-te t-l-tt—-+-+-4+-- 4-4] 

------- Hmmm realm mete te Ht ttle +--+ tee - $4] -- tee e eee) 

-+-t-tt+- $4 —-4]-4¢4--¢4--4+4-4-[4--4+4-4--4+-44-|-4+-44-4-4-4--3 

--------- tam malt tr mt tm nm ttm ttle ttt tet Hl Ht tte teed 

Het totter ti tl[t--— t+ --t4--¢ 4-4-4 --4+454--4-4/-4-4-44-4-4-4- 

----------- +o--|+4+4+4+--44+--4+4--]--4+444--44+44--[44------4+4+444+ 

tate tet tte tle— t-te tet] -4 4-4 -- 44-4 --4¢]4--4-4-44-4-4- 
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B. Designs for L = 3,5,7. The first column is given below and the complete design is 
formed by permuting it cyclically (N —1)/(Z—1)—1 times and adding a row of zeros. The 
corresponding orthogonal matrix A of § 4 (II) is obtained by replacing the component value 
symbols of the design by the rows of Q (§ 4 (I)) with its first column suppressed. 


=9,L=3. 01220211 

=27, L=3. 00101 21120 11100 20212 21022 2 

=81, L=3. O1111 20121 12120 20221 10201 10012 22021 00200 02222 10212 21210 10112 20102 20021 11012 00100 
=25,L=5. 04112 10322 42014 43402 3313 


125, L =5. 02221 04114 13134 12021 10244 31402 00444 20322 32121 32404 22043 31230 40033 34014 41424 21430 34403 
11241 03001 11302 33234 34231 01330 12243 2010 
=49, L=7. 01262 21605 32335 20413 11430 65155 61024 54425 03646 634 
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Added in proof 
(1) Since this paper was written, one of the authors (J. P. B.) has obtained a design 
for L=3, N=18, n=7, by trial and error. It is known that this is the largest value of 
n possible. 
(2) The designs given above for L = 2 provide what is effectively a complete solution of 
the experimental problem considered by Hotelling (1944) and Kishen (1945). 
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ON THE SOLUTION OF SOME EQUATIONS IN LEAST SQUARES 
By D. V. LINDLEY 


In carrying out an experiment of the above type it is often not possible to arrange that the 
factors occur at exactly their required values: they will deviate by a small amount in either 
direction from the ideal aimed at. It is possile to allow for this in the case of the analysis of 
the results of a two-level experiment. 


The equations to be solved by least squares when the factors are at the ideal values are 


y;, = M+ Yaym; (5 = 1,....N,j = Beers 4 (1) 
7 
where the a;,’s are plus or minus one and 
oy 


where 2t, is the difference between extreme and nominal, or what is usually half the tolerance 
allowed. This assumes that the origin is taken halfway between the extreme and the nominal 
so that the equations assume a more useful character. We can further suppose the extreme to 
be at a higher value of x; than the nominal. Now suppose that there is a small deviation from 
the ideal e;;, given by actual minus ideal, corresponding to each a;;. If this deviation occurs 
at the extreme value, i.e. a;; = +1, the new extreme value will be ¢; + e;; from the origin: on 
the other hand, if it occurs at the nominal, a;; = — 1, the new nominal value will be t;— 4; 
from the origin. So in either case the deviation is t; + a;;e;; from the origin. 
So equations (1) should now read 
y¥, = M+ Ya, oe (t; +@;;€;;) 
j j 


=M+> (2+ “a) oy t; (since a,j= 1) 

= M+ 4,,m,;, (2) 
¥] 

wm Gis = Ajj + Geg/t;. 


Let us now write equations (1) in matrix notation 





Y = AX, (3) 
i.e. Yi Ste Gy ss Oe M 
Y2 1 Gg, Age Gen m, 
Ys J=] 1 Gg, Age Asn Ms 
YN 1 Gy, Gye ann My 


Then equations (2) can then be written 


Y =(4+B)X, 


(4) 





(2) 


(3) 


(4) 





— 
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where B is the matrix O eye, Cyglta Cnglty --- Canfbad >» 
O Cgy/ty Cga/t, Caglty --- Can/en 
O eyy/t, Cna/te ens/ts --- enalt 


which is dependent on the deviations from the ideal values divided by half the difference 
between nominal and extreme. 


Now the least square solution of (3) is easily found; it is 
X, = (A’A)*A’Y. 
We have, in order to solve (4), to solve the equations 
(A+ By Y =(A+B)' (A+ B)X 
or A'Y+ B’Y= A’AX+(A’B+ B’A+ BB) X. (5) 

Since the elements of B are small in comparison with those of A, an approximate solution 
is provided by X,, the solution of (3). If we put this in the smaller unknown term in (5) we 
get a second approximation in the solution of 

A'Y+ BY = A':iX+(A’B+ B’A+ BB) X,, 
i.e. X, = (A’A)"1[A’Y + B’Y —(A’B+ B’A+ B’B) Xj], 
in general the rth approximation is given in terms of the (r— 1)th by 
X, = (A’A)7[4'Y+ BY—-(A’B+ BA+ B’B) X,_4]. 

This then enables successive approximations to the solution to be found, and we can carry 
it on until the accuracy is as great as we desire. This will usually be dictated by the accuracy 
with which the y; and the e;; were measured. In one practical case it was not found necessary 
to proceed beyond X,. Since (A’A)- is diagonal the solution at each stage is simple. Once 
(A’B+ B’A+B’B) has been calculated each stage only involves the computation of 
(A’B+ B’A+ B’B) X,_, and the subtraction of it from A’ Y + B’Y. It is important to notice 
that B has a column of 0’s and A a column of 1’s corresponding to the mean M. 

In the ideal experiment where the factors are all at nominal or extreme, the standard error 
8, associated with m,, is given in terms of the residual s by 

Se = (Agy/ Dy) 8°, 
where D, = the determinant of A’A, and A,, = the minor of the (k, k)th element in 4’A. 


When A’A is diagonal this is given by 


1 
sz = — 8?, 


Chk 
where c,; is the (k, k)th element of A’A. 
In the practical case we then have 
si = (A+ B)gx/ Dai ns*, 
which to the first order in the e,; is 
1 
8k = (Ajy/D,)8* = ol 
kk 
as before. 
Thus we have obtained without too large an amount of labour the solutions of the equa- 
tions as accurately as we need and the standard errors of these solutions. 


This work was carried out as part of the Research and Development programme of the 
Ministry of Supply (S.R. 17) and appears by permission of the Chief Scientific Officer. 
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SOME GENERALIZATIONS IN THE MULTIFACTORIAL DESIGN 
By R. L. PLACKETT 


1. The following remarks may be regarded as a sequel to a previous paper (Plackett & 
Burman, 1946), although the notation has been changed and compressed for convenience; 
it is hoped that no confusion thereby ensues. We consider first the determination of main 
effects, and find what modifications are required when certain of the orthogonal transforma- 
tions previously used are no longer orthogonal. 


2. Weassume that y,, the true value of the measurement on the rth assembly, is expressible 
in the fi 
wich sarvinpernas y= DA; (r=1,2,....N3j =1,2,...,8), 
A 


there being n components A, B, ..., K, wheré A; is the effect due to component A at its jth 
value. 


The vector (A,, Ag, ..., A,) is denoted by A’. Make the transformation A = Aa, i.e. 
7 


to a new system of variables (a,,a,,...,@,) represented by a’; A is a non-singular axa 
matrix, whose first column consists entirely of ones. Suppose A; and B,; appear together 
in a design w,; times; then A; appears w;, times where 


Vig = z Wij» 
j=1 
and similarly B; appears w); times. For component B, B = Bb, where b is a column vector 
and matrix B is a non-singular b x b matrix whose first column is all ones. With 
M =a,+6,+...+k, 
we write X’ = (M, Gg, ..., Gq, Dg, ..25 Og, 205 Bg, ..-> kp), 
and Y = PX, the first column of P consisting entirely of ones. 


We find what conditions are satisfied if P’P = N.I. Within the columns corresponding 
to component A we have 


~ B54 ip, Wig = NO jy. (1) 
Again, considering the product of a column of component A with one of component B, 
2465 Pin Ons = N6,;b,. (2) 
In equation (1) put 7 = 1 and obtain 
24 Vio = Nb, 
a set of linear equations which may be written 
2 ies Wi0 = Noy, 


the solution of which is 
Wig = Din NOx = Na,,, (3) 


where G;, is an element of matrix A-!. Similarly wy, = Nb,,. 
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It follows immediately from equation (2) that 


wy, = NG,;6,,. 
Therefore Nwy, = Win Vox: (4) 


Finally, it is more convenient to express equation (1) in terms of the elements of A-* 
rather than those of A. Suppose W is the matrix whose elements are 6;;wj9. Then 


A'WA=N.I, ie. N.A“W-(Ay =I, 


therefore ¥4,;4,;N wi = 3, 
j 


i.e. Gig Fas(ay)™ = Op. (5) 
Condition (4) arises also in analysis of variance. If matrices A and B have orthogonal 
columns, then (3) becomes w,;) = N/a and similarly w,; = N/b, so that w,, = N/ab. There 
is no difficulty in showing that conditions (3), (4) and (5) are sufficient for the validity of 
equations (1) and (2). 

With designs and matrices satisfying these conditions we may therefore determine 
Gg, Ay, ..., 4, bg, bg, ..., 5, with maximum precision and our estimates of these parameters are 
independent. In particular cases, such non-orthogonal functions of the A; and B; may be 
of greater interest or moment than the orthogonal functions usually chosen. Two conclusions 
may thus be drawn: 

(i) Having defined a set of linearly independent linear functions, not necessarily ortho- 
gonal, of the A;, then these functions may be determined as independent statistics with 
maximum precision, provided (3), (4) and (5) are satisfied for all components present. 

(ii) If, for any reason, a factorial or multifactorial design is incomplete owing to loss of 
observations, then provided (4) is satisfied it may be possible to find from (3) and (5) linear 
functions of the A; which can be determined as independent statistics with maximum 
precision. 

3. Designs for which Nw,;, = wi) Wo;, may be constructed immediately from those given 
in Plackett & Burman (1946). Consider the design for N assemblies in whicheach component 
appears at L values. For such a lay-out w,) = wo, = N/Land wy, = N/L* (i,k = 1,2, ..., L). 
Corresponding to component A, divide the LZ symbols into groups of u, ug, ..., u, 80 that 


each member of a group is equal, so that the Z symbols are successively replaced by 
1,1,...,1,2,2,...,2,...,p,p,...,.9. This transforms component A into one which appears 
at p values. Similarly, corresponding to component B we have groups 2, Ug, ..., ¥, 80 that 


> v, = L. 
k=1 
We now have w,, = u;.N/L, wo, = v»,.N/L, and wy, = u;.v,.N/L*. Thus Nw, = wip ox. 
Condition (3) gives u,/L = @,; and v,/Z = 6,,; a suitable value of Z is now chosen so that 
u, and v, are all integers, and the design constructed. 
4. We may extend the scope of our inquiry to include interactions, prefacing our exten- 
sion by clarifying what appears to be a known result which gives orthogonal functions of 
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the observations corresponding to interaction degrees of freedom (Fisher, 1942). When 
first-order interactions are present the effect due to the ith vaiue of component A and the 
jth value of component B is R,; = A, + B;+(AB),;, 


the term (AB),; representing the interaction. We again make transformations A = Aa 
and B = Bb where the matrix A has a first column of ones and all columns are orthogonal 
to each other, similarly for matrix B. Consider first the quantities (A;+ B;). With M = a,+6, 
we can transform these into (M, dg, dz, ...,4,, bg, bs, ...,6,), and the matrix R, of the trans- 
formation will consist of a first column of ones, followed by (a—1) columns formed by 
repetitions of the rows of A, followed by (b—1) columns which are repetitions of the rows 
of B. Clearly the columns of R, are mutually orthogonal. There remain (a — 1) (b— 1) columns 
R, to be chosen so that the matrix R = (R,: R,) is a square matrix with mutually orthogonal 
columns. Corresponding to the columns of R, we may choose quantities 7... %a4541) --+> Tap 
into which the (A B);; may be transformed. ; 

The columns of R, may be chosen arbitrarily, but there are two methods whereby they 
may be written down at once. The first method is the one referred to at the beginning of this 
section, which consists in taking the (a — 1) (b— 1) inner products of a column of R, (not the 
first) belonging to component A with a column of R, (not the first) belonging to component 
B. Thus take the inner product of the th and (a — 1+ u)th columns of R,. The scalar product 
of this column with the vth column of R, (2<t, v, <a; 2<u<b) is 


XL Gigb ju Aiy- 
4,3 


Keeping i fixed and summing over j gives zero. Hence the columns of R, are orthogonal to 
those of R,. That they are orthogonal between themselves follows similarly from the fact that 


LF yb ju. %Fj49 = 9 unless t= vandu=w. 
ij 


The second method is at present available only when a = b = L = p™(pa prime and man 
integer). We refer to the modified factorial designs in §7 of Plackett & Burman (1946). With 
matrix B equal to matrix A the symbols in the design for N = L? are replaced by the corre- 
sponding rows of matrix A, the first column of A being omitted. 

Writing ab =r; 9,03, ...,@,, bg, 63, ...,b, respectively equal to 19,73, ...,7445-1; R the 
column vector elements R,;; r the column vector elements r,, 79, ...,7,; we now have R = Rr 
where R is a matrix, elements r,;, whose first column consists of ones, all columns being 
mutually orthogonal. 

5. Now regarding R as a single component and R, (h = 1, 2, ...,7) as the effect due to its 
hth value, we must have for maximum precision of determination of R, and C;,, the effect 
due to component C at its kth value, that w,, = N/r.c. Thus if w,,;, denotes the number of 
occurrences together of A,, B; and C,,, then w;;, = N/abc. Hence for optimum determination 
of the effects A, B,{AB) and C, all values of components A, B, C, must appear together 
equally frequently. This condition, however, leads to the optimum determination of effects 
B, C, (BC) and A; also of effects C, A, (CA) and B; because any first-order interaction is 
connected with two components only, all combinations of values of which appear equally 
often with all values of the third. Hence if for any three components in a design, all com- 
binations of values appear equally frequently, all first-order interactions as well as main 
effects may be‘determined with maximum precision. In order to estimate a particular inter- 
action (A B), all combinations of values of A and B must appear equally often with the values 
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of any other component present; in which case the first-order interactions between A and 
any other component, and between B and any other component, are automatically deter- 
mined. Generally, therefore, having decided on those interactions which are of interest, 
N = K xu.c.M. of all relevant triplets abc; when all components appear at L values each we 
obtain N = KL’. 

This result is immediately generalized and for interactions between (¢— 1) components 
appearing with other components we must have N = KI‘ when each component appears 
at ZL values. As interactions of higher order are included, we obtain a whole series of designs 
building up to the complete factorial design or replications thereof. 

6. Similarly, the results concerning non-orthogonal transformations may be extended. 
We use the same notation with respect to component R; the matrix R, is constructed in the 
same manner from matrices A and B, but no longer consists of orthogonal columns; and 
the columns of R, are no longer orthogonal. The same meaning is attached to w,,, as in §5 
and W;j9, Wio9 are defined by analogy with § 2. 

For orthogonality between the columns of the matrix formed by repetitions of the rows of 
R, and corresponding to the main effects of components A and B, we have Nw;;9 = Wjg9%ojo- 
If now the columns of R, are formed by inner products of pairs of columns of R, ag in § 4, then 
a repetition of the argument used there, together with the condition that Nw,j9 = Wi99Wojo: 
shows thatin the complete design the columns of the matrix corresponding to interaction (A B) 
and formed by repetition of the rows of R, are orthogonal, both to columns corresponding 
to main effects A and B and between themselves. 

From the orthogonality of the columns of component R to those of component C we deduce 
NWijx = Wi3oWoox Siving finally NW45p = Wioo ojo Woon» (6) 
which again leads to the optimum determination of interactions (BC) and (CA). Further 
generalizations of condition (4) follow immediately, together with appropriate modifications 
of conclusions (i) and (ii) in § 2; conditions (3) and (5) must always be satisfied as they stand, 
apart from slight changes of notation necessary in condition (3). 

7. We now give an illustration of the design of an experiment where main effects only of 
components are considered, but the transformations made on the A,, B;, ... are not ortho- 
gonal. Suppose that each component appears at three values such that the difference between 
the high and medium values is twice the difference between medium and low (i.e. we might 
be considering resistors of value 2000, 4000 and 8000 ohms). The effects of low, medium and 
high values for component A are respectively A,, A, and Ag, and we shall be interested in 
considering 2A,—3A,+ A, which will be zero if the A; are linear functions of resistance 
value. We shall also be interested in A, — A,, measuring the total change in the measurement 
due to varying the resistance over the whole range. It should be remarked that even if 
component values are equally spaced (e.g. in our example the three resistor values were 
2000, 4000 and 6000 ohms) we might wish to test the hypothesis that the effect of a shift 
from 4000 to 6000 ohms is twice that of a shift from 2000 to 4000 ohms: this would again 
involve us in testing whether 2A, —3A,+ A, departed significantly from zero. 

In order that all conditions may be satisfied we take: 


a@,=2A, +yA, +2Asz, 


x —_—* 
a, = —8A, +8As, ie. matrix A1=|-s 0 8 
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Applying condition (5) we obtain 
ztyt+z=1, -—2/z+1/z=0, ie. x = 2z. 
Taking x = }, y = },2 = }gives 
1/2 1/4 1/4 
A+=|-1/J6 0 1//6 
2/48 —3//48 1 /./48 
Thus wy) = N/2, Woy = N/4 and ws, = N/4, by condition (3). If we suppose that matrix B 
is identical with matrix A, we obtain for the matrix whose elements are w,,;, the number of 
coincidences of A; and B,, the following: 
N/4 N/& N/8 
N/8 N/16 N/16 
N/8 N/16 N/16 
With five components whose values are represented by 0, 1, 2 a design of the required type 
can be constructed according to the method of §3 from the dgsign for N = 16, L = 4 (as 


obtained by the methods of Plackett & Burman (1946)) by replacing the four symbols for the 
component values by 0, 0, 1, 2. This gives: 


00000 00021 10102 20210 
00000 00012 10201 120 
ollll 01200 11020 21002 
02222 02100 12010 22001 


When designs of the complete factorial or multifactorial type are not so available the 
construction of such a design without requiring an excessive number of assemblies will 
often necessitate a certain amount of ingenuity. 

It is perhaps of interest to remark on a certain transformation, sometimes made, which 
results in the matrix A consisting of a first column and a leading diagonal of ones, all other 
elements being zero. It will be clear from the foregoing analysis that in this case it is impos- 
sible in any design whatever to determine the a; with maximum precision, and some altern- 
ative transformation should be used. 

We conclude by pointing out that a large class of combinatorial problems has been raised, 
of which a comparatively small proportion may be solved by the methods so far evolved. 
Statements in the foregoing that an experimental design must take a certain form should 
not be taken as implying that the relevant combinatorial lay-out necessarily exists. 


This paper is published as part of the programme of the Ministry of Supply, Research and 
Development (S.R.17), appearing by permission of the Chief Scientific Officer. 
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THE GROWTH, SURVIVAL, WANDERING AND VARIATION OF 
THE LONG-TAILED FIELD MOUSE, APODEMUS SYLVATICUS 


By H. P. HACKER anp H. 8S. PEARSON 


II. SURVIVAL. By H. P. HACKER 
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1. INTRODUCTION 


The purpose of this paper is to bring together the data we have collected on the length of 
life under natural conditions of the long-tailed field mouse. The trapping, marking and 
releasing have been done only during winter months in order to avoid interference with 
breeding, and so to leave the population as intact as possible. 

The work began in 1936-7 and was continued for six winters. The data on growth have 
been described by Hacker & Pearson (1944), and further papers on travel and variation 
are projected. Some account of the disposition of traps and the amount of movement of 
the mice is necessary for a discussion of the evidence on the length of survival. A map of 
the area trapped is therefore included in this paper and will also apply to the more detailed 
account of the distances travelled which will be published later. 


2. METHOD OF MARKING MICE 


In the 1936-7 season we started by marking the right hind foot of each mouse with one 
of the metal rings used for identifying canaries.: These were made of aluminium with 
numbers stamped on them; they fitted the limbs well but the metal was too soft. By 
rubbing on the ground and by being gnawed by the mice some of the numbers were made 
difficult to read, and the edges of some of the rings became sharp and jagged, injuring the 
mice so severely that we had to kill them with chloroform. 
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Evans (1942, p. 184) used rings, and when the foot swelled too badly for the ring to be 
removed he amputated the leg. He found that ‘the majority of these individuals were subse- 
quently recovered in later censuses and appeared to be in healthy condition’. There must 
be some doubt, however, whether such mutilated individuals should be included in survival 
records or analyses of travels. 

After a trial of two weeks we gave up using rings and have not tried the nickel rings 
described by Chitty (1937, p. 41). He has kindly sent us samples of his rings for com- 
parison and the metal is much harder than ours, but after our experience of puncturing 
ears as a method for marking mice we would not think of returning to ringing. One definite 
advantage of using metal rings is the possibility of recovering them from the voidings of 
predators and so tracing the fate of the mice. We once found a mummified mouse, and by 
soaking the ears in water could read its number, but it would be impossible to do this after 
the body had been eaten. 

The marking of the ears is done without an anaesthetic with the animal held lightly in 
an assistant’s hand, and it very rarely squeaks, bites or struggles after the first attempt 
to escape from the hand. If a mouse does bite or behave obstreperously, it is quite often 
found to have behaved in the same way before; for instance, one out of a family of six 
reared from birth maintained a reputation for biting whenever it was touched. The 
membranous ear of Apodemus seems therefore almost insensitive, and we have not had 
any reason to suspect that the punctures affect the life of the mouse in any way. 

The instrument we use is a leather punch with an interchangeable die which makes holes 
of about 1-5 mm. in diameter. This is rather clumsy but costs only two shillings and is quite 
effective. A more elegant and efficient, but much more expensive, instrument is a dentist’s 
rubber dam punch. The small spring punch which chicken rearers use is quite useless for 
this purpose. We tried several and did not get clean punctures; moreover, the slight click 
that this instrument makes is more disturbing to the mouse than the actual puncture. 

The four quarters of the ear pinna give distinctive sites for puncturing, and by combina- 
tions of not more than three punctures in each ear more than 1000 different patterns can 
be made. By using each pattern twice, once for each sex, we could use the simpler patterns 
with only one or two punctures in each ear. These are easier to make and to identify, and 
we had enough to choose from without having to clip the toes as recommended by Burt 
(1940, p. 12). The lower anterior quarter of the pinna is more fleshy than the other three, 
so that punctures here tend to heal up and to need puncturing again when the mouse is 
recaught and examined. This difficulty can be largely overcome by making them as high 
up the margin of the ear as is possible without risk of confusion with an upper puncture. 

Incidentally, we may remark that the method totally failed when applied to Cletherio- 
nomys (Evotomys), whose ears are short, hairy and thicker in texture. The punctures heal 
up readily, leaving puckered scars. We therefore did not try to mark individuals of this 
genus but merely made a single puncture on the left ear of all those caught, in order to show 
in any future trapping that it was not a new mouse. Even then a puckered left ear and a 
normal right ear was often the only sign that the mouse had possibly been caught before. 


3. METHOD OF TRAPPING 


We used the Selfridge trap described by Elton e¢ al. (1931, p. 714), and by taking out 


two bars from the back added the nest box introduced for the Tring trap by Chitty 
(1937, p. 39). 
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The main disadvantage of this cheaper trap, the danger to the tail of the mouse, we tried 
to circumvent by fixing a small stop to prevent the door touching the floor when it shuts. 
But an injury to the tail is a minor disaster to the mouse, as it is well known that A podemus 
can escape by shedding part of its tail (Barrett-Hamilton & Hinton, 1910-21, p. 502). 
Sumner & Collins (1918, p. 1) have some interesting records of this faculty among 
American species of mice. One of us found the skin of a tail at the mouth of a mouse hole, 
evidence of a narrow escape from a predator. What happens is that the skin slips off very 
readily and the exposed tendons and bone shrivel, or are gnawed off, leaving a stub. Some 
very remarkable deformities due to this and other injuries were found in mice caught for 
the first time, 57 out of 1000 such mice showing injuries not due to trapping, whereas 84 
mice out of 1000 consecutive catches were found injured by the trap. As even very minor 
injuries were recorded among these with a view to their use in identification on a future 
occasion, the rate of injury, though regrettable, was not much greater than that which 
occurs normally in nature, and some of those we inflicted were observed to heal completely. 

In passing we might record a difference in liability to accident noticed in getting out 
these figures. One male injured its tail three times out of the six times it was caught; on 
the other hand, a male of similar size was caught ten times without injury and a female 
fourteen times. A similar individuality to that noted in the matter of biting seems 
to be present, and it may be that the more ‘cautious’ ones tend to get their tails injured 
by being slower in entering the traps. 

Undoubtedly the comfort and well-being of the captive mice is greatly increased and the 
number of deaths reduced by the nest box devised by Chitty. Their comfort is also increased 
(1) by putting the trap under cover of vegetation when possible, (2) by pointing the trap 
away from the prevailing wind and weather, and (3) by putting the trap on a slope so that 
water does not run into the nest box. The first two points probably add to the efficiency of 
the trapping as (1) the mice seem to like cover, and (2) sticks and leaves do not collect in 
the entrance and prevent the trap shutting. 

A mechanical difficulty met with should be mentioned. With frequent rebaiting, the bar 
on which the bait hook hangs becomes so loose that the mouse can push it aside and 
escape. This can be prevented by the simple precaution of pushing the nest tin over the 
end of the bar to keep it fixed in position. 


4. ARRANGEMENT OF TRAPS 


In the 1936-7 season we worked at Studland in Dorset on the area that Diver (1933) 
has studied in such detail. Our main ebject was to find the distribution of mice in 
relation to the different types of habitat he had mapped out, and for this purpose we laid 
our traps at the most likely spots in each of the areas we were testing, without regard to 
the distance between these areas. We thus gained our own experience of how far Apodemus 
travels, and from our records there (which we hope to publish with maps) and from 
Chitty’s (1937) useful summary of previous records of distances travelled by small 
mammals, we decided to use a grid of 100 yard squares in our routine trapping during the 
following winters. This we were able to do in Holwood Park, Keston, by permission of the 
late Lord Stanley and of Lady Stanley. 

The reference map shows the part of the estate we used, with the grid of trapping centres 
dotted on it. The continuous line includes the two areas over which we trapped in 1937-8, 
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The western area consists of park and woodland, while the eastern area is mainly farmland, 
grazing and arable. The broken line is the rectangle, 700 by 300 yd., that we used in 1938-9, 
partly overlapping the wooded area trapped in 1937-8 and extending into the woods west 
of the public footpath (F.P. on map); as this area was too large to cover in one week the 
nine new sites in the west woods were trapped first and the twelve old sites in the east 
woods in the following week. The dotted line shows the area trapped in the next two years 
when the same middle strip of seven trapping centres was used with every alternate 
trapping centre on either side. 

At each trapping centre we put out six traps in the form of a hexagon with sides of 
10 yd., each trap site being also 10 yd. from the centre. Having marked the site we set the 
trap in the most favourable spot within a yard or so. Since neighbouring hexagons were 
trapped simultaneously it is not likely that a trapping centre would have caught mice living 
near another centre, but only those that lived in the intervening area. The hexagons on the 
edge of a trapped area have, however, a larger region from which to draw mice than 
hexagons in the interior, so that sometimes we shall describe results from the central and 
peripheral parts separately. 


5. LENGTH OF TRAPPING PERIODS 


In the 1936-7 season at Studland we gained the impression that if kept out for three or four 
nights a group of traps would catch mest of the mice in the immediate neighbourhood, and 
that any mice caught later than this tended to come from a greater distance. An example 
of the kind of evidence on which this opinion was based may be quoted. 

During nine nights in February eighteen traps were set in a hazel wood and the following 
are the numbers of mice caught each night: 

25 Se Se Se ee 
They show a rapid decrease from nine on the first day to none at all on the fourth and fifth 
nights, and then a renewal of catches suggesting outside mice coming into the area. The 
same impression is gained from Table 1. The relation between the day on which a mouse 
was caught and the distance it lived from the traps is further dealt with in § 13. 

In the 1937-8 season we used 3 days as the routine period. In 1938-9 we increased this 
to 4 days in most months but to 5 days in November in the east woods, and to 6 days in 
March and April in both woods. In later years routine trapping was only done in December 
and March, and we increased the time to as many days as seemed convenient or necessary, 
our longest time being 10 days. 

Although the catch of the first day or two may not always be good, as in March 1941 
(Table 1), the results usually confirmed our choice of 4 days for the minimum period of 
trapping, as in March 1940 (Table 2). 

The chief reason for this difference in the rate of catching was undoubtedly the weather. 
We can be fairly sure that a moonless or cloudy night is favourable to wandering, a wild 
night of south-west wind and rain seeming as good as any.* These conditions are perhaps 
unfavourable to owls. Snow and possibly also hoar frost seem to be unfavourable condi- 
tions. We did not keep weather records but only notes of striking changes in the weather. 
If later on we can compare our catches with the nearest meteorological record we may be 


* Burt (1940, p. 25) came to much the same conclusion about the effect of weather, and Evans (1942, p. 190) 
emphasizes the effect of abnormally wet weather in increasing the number caught. 
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able to speak with more certainty about the effect of weather. But it will be seen in the 
following section on the efficiency of trapping that most of the mice that we have reason to 
expect to be in the neighbourhood do get caught, and that the length of trapping we 
adopted is justitied.* 

One point should be borne in mind when considering these records. The mice were kept 
in cages and only set free, when the trapping was finished, at the centre of the trapping site 
at which they were caught. This leaves the area vacant and free for outside mice to come 
in. Rather different results woulc »2 obtained if the mice were set free at once and allowed 
to stay in their home area, and it is certain that under these conditions a good many more 
traps would be needed as a reserve to remove the local mice each successive night. Chitty 


Table 2. 18-23 March 1940. Six traps 

















Table 1. 24-29 March 1941. Six traps in each group. Four groups for first 
in each group. Six groups for 6 days ~ 4 days, and four groups for 6 days 
Ind . to site x no. i 3 
” pay —_— Catch on successive nights “7S — Catch on successive nights 
B2 — es, & & & 8 A2 >. a... = 
D2 a ee on C2 ee eR ee. 
F2 s & 4.8. Bre A4 a = SS 
B4 - ee oe A a C4 ie tee 
D4 -— & S B&B & 8 D3 oe i eS 2 
F4 _ ~« Sa © 3 E3 > we oS S& 
F3 ae ae ee Se 
G3 Sa eS & Sf 
Totals 8, 23, 29, 21, 2 Ill Totals ma & & & GO 


























The tables show the catch from all traps set during the periods stated. Each group of six traps is entered 
separately to show the local variation in rate of catching that occurred. The index number for each group 
enables a reader to find its position on the map. 


warned us that local mice block the traps each night and our own experiences confirm this. 
For example, we set free a series of thirty-one females as soon after they were caught as 
possible because they were either pregnant or nursing mothers. These mice gave an 
aggregate of ninety-nine captures, all on consecutive nights except for one mouse that 
missed two nights and one that missed one night. 


6. EFFICIENCY OF THE TRAPPING 


In order to know how definitely we can assume that a mouse had either died or emigrated 
because it was not caught at any given trapping, it is necessary to get some idea of the 
likelihood of catching any mouse known to be alive in the area, in other words to have some 
test of the efficiency of the trapping. The best figures for this test are shown in Table 3; 
they are those for the season 1938-9, in which seven trappings were made in each hexagon 
throughout the area marked by the broken line on the map. Each horizontal line in the 
table represents a batch of mice all caught for the first time and for the last time in the 


* Bole (1939, p. 57) has given reasons for choosing 3 days as the period for trapping. 
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same two months, which are indicated by the x’s at either end of the line.* At each 
intervening trapping the whole batch must have been alive but might have missed being 
caught; the table shows for each batch: 

(a) the number of possible catches (bold type) ; 

(6) the number of mice missed (italics). 

The numbers (a) and (6) are totalled for all batches at the bottom of the tables, and hence 
the percentage of misses can be calculated for each month. 


Table 3. Monthly distribution of misses, 1938-9 


















































East woods West woods 
l 7 l 
| Months | Months 
No. of | | No. of 
mice | A. f. a 
N D J F M A J N. D J F M A M 
15 x 15 15 15 15 15 x 12 x 12 12 12 122 12 x 
| x 10 0 3 0 0 x x 0 0 1 0 1 x 
ay ee: 6m oe x 4 x ot a7 a + x 
la 2. i. a . . ii Je ae 
5 x 5 5 > 2 x 2 2 s x 
x 1 0 Sore x 0 0 Oa 
5 x 5 5 x 2 * 2 x 
x 2 0 x x ee 
9 x 9 x 15 x 15 15 156 15 x 
x 5) x x 1 0 0 0 x 
5 x 5 5 5 5 x 2 x 2 2 2 x 
x 0 2 0 0 x x 0 0 Goow 
2 x 2 2 x 4 x 4 x 
| x ep. x ae 
1 x 1 * 13 | x 3 3B x 
x ” = x 2 2 0 x 
12 x a? @ FB x 1 x 1 1 x 
x 6 3 1 x x 0 ee 
8 x 8 8 x 3 x 3 x 
x 4 Pe x oe 
9 x 9 x 2 x 2 2 x 
x : = x 1 0 x 
2 x 2 2 x 2 x 2 x 
x 0 0 x x 0 x 
1 x 1 x 
x oa 
4 x ae Summary Grand 
x 0 x Month _ Fr M. A. total 
Total possible 20 39 52 49 44 204 
catches 
Summary Grand Total misses 12 2 3 2 1 Wz. 
an DD - 2 & Percentage of 50 51 58 61 23 49 
Total possible 51 50 73 60 38 272 misses 
catches 
Total misses 25 0 26 5 1 57 
Percentageof 49 0 36 83 26 21-0 


misses 
Note on Fast Woods. For December and February 51 misses out of 124 or 41-1%%; tor other months 6 misses out 
of 148 or 4:1%, 


* Thus of the fifty-one mice caught in the east woods for the first time in November, nine were caught for the 
last time in January, five for the last time in February, five in March, seventeen in April and fifteen in June. 
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In the east woods there were many misses in December and February owing to snow in 
the week of trapping, but if these two records are excluded the remaining figures (including 
no miss in January) give a total proportion of misses of only 4-1%. In the west woods, 
where no exceptionally bad weather occurred during trapping, the total proportion of 
misses was 4-9 °%,. 

If the figures of Table 3 are subdivided between (1) central trapping sites surrounded by 


other trapping sites, and (2) peripheral trapping sites on the edge of the trapped area, we 
find the following results: 


East Woods 
Central 21 misses out of 93 chances or 22-6% 
Peripheral mS « = =e sy 
Totalasintable 57 ,, 5, 272 4, 5 210% 
West Woods 
Central 3 misses out of 64 chances or 4-7% 
Peripheral 7 » 140 , » 50% 


Total asin table 10 ” ” 204 ” ” 49% 


There is obviously no real difference in the incidence of misses in the centre and periphery 
of the area. The only distinction we have been able to detect is that the six mice that missed 
twice running were all on the periphery, and even this does not mean much as there were 
only five trapping sites in the centre compared with sixteen on the periphery. 

It thus appears that under ordinary coriditions of weather the rate of misses was about 
5%, or that there was a 20 to 1 chance of catching a given mouse. In bad weather the 
chance of catching was less, but missing a mouse in two consecutive months was rare. 
We may conclude therefore that the trapping was effective and that a mouse no longer 
caught had probably either died or emigrated. 

Another opportunity for testing the efficiency of trapping occurred in the next season, 
1939-40. Only two trappings, in December and March, were made over the whole area, but 
four hexagons were also trapped in February (D3, E3, F3 and G3). This smaller trapping 
was done in a short mild spell in the phenomenally severe frost of that year, to find out 
how many of the December mice had survived the hard weather. Out of twenty mice 
caught in both December and March not one was missed in February. Perhaps hunger 
helped to cause this complete catch. 


7. MONTHLY SURVIVAL RATE 


In considering the records of survival for 1938-9 (details of which are given in § 8), all mice 
aceidentally killed have been excluded for obvious reasons. In the present section all mice 
when caught for the first time have also been excluded because so many were found to 
disappear within the first month after capture; these are studied separately in § 12. : 
The proportions surviving each month, shown in Table 4, are so similar as to suggest that 
the rate of disappearance during the 4 months considered was very nearly constant and can be 
approximately described by the mean monthly survival rate 0-876. This ratio, which means 
that we should expect seven out of eight mice to be alive at the end of a month, can be used 
as a standard with which to compare survival over other periods of time and in other groups 
of mice. For this purpose we may use the formula y, = y, (0-876)", where z is the time in 
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months measured from the beginning of the period, y, the number of mice at the beginning 
of the period and y, the number at the end. 

Since the number of mice disappearing is proportional to the number present, either the 
predators disappear at the same rate, or they find increasing difficulty in catching the 


mice as these become larger and scarcer; possibly the larger the mouse the more lasting 
the meal. 


Table 4. Number of mice surviving from one trapping till the next. 
December 1938 to April 1939 








| 
Period Mice caught in No. surviving at Proportion 
first month next trapping surviving 
| 
Dec.—Jan. 82 72 0-878 
Jan.—Feb. 105 91 0-867 
Feb.—Mar. 141 124 0-879 
Mar.—Apr.* 131 108 0-824* 
(0-879) 




















* This period was 1-5 months and the proportion for one month is shown in italics. This rate r is calculated 
thus: 108/131 =r". 3 


8. SURVIVAL FROM ONE TRAPPING TO THE NEXT 


The seven trappings of 1938—9 enable us to record the progressive diminution of the group of 
mice first caught in any given month. It will be seen from the map described on p. 337 that 
only fifteen of the hexagons set in this season were set again in the following year, so that 
the mice from the other six hexagons are not recorded as their later history is not known. 
Trapping stretched over a fortnight, as we could not cover the whole area in a week; the 
western seven hexagons were set in the first week and the eastern eight in the second. 

Trapping started on 14 November 1938, and the four intervals between the first five 
trappings were each of 1 month. The interval between the March and April trappings was 
1-5 months (6 weeks). Then the western part was trapped in May after an interval of 1 
month, and the eastern part in June after 2 months. These two trappings are combined 
to form the seventh trapping at an average interval of 1-5 months, giving an error of 
2 weeks which is negligible over the whole period. The area was trapped again in December, 
giving an interval of about 7 months, and the next interval was exactly 3 months to March 
1940. Not one of these mice was caught after this although fifteen hexagons were trapped 
in the following December and March. 

These details as to times, and those already given about the places and methods of 
trapping, are tedious, but are essential for estimating the reliance that can be placed upon 
the results obtained. The facts relating to the times of trapping are used for calculating 
the ‘periods of survival’ used in the following tables, and up to the sixth trapping may be 
used for finding the actual date of trapping if necessary. 

Eighty-nine mice were caught in November, and the number of these caught again or 
known to be alive at each successive trapping is shown in Table 5. The results.from the 
middle strip of seven hexagons are shown separately from those for the eight marginal 
hexagons; the two are then added together to give the totals for the eighty-nine mice. 
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The numbers first caught, forty-five and forty-four, in the two subdivisions of the area, 
are so similar that the figures are readily comparable without calculating proportions, and 
it is obvious that there is no appreciable difference between the margin and the centre of 
the area. Below the table the numbers expected from the standard survival rate are com- 
pared with the actual totals. The number for the first month is much lower than the stan- 
dard, and this will be discussed in § 12 on the meaning of mice caught once only. During 
the rest of the winter the figures correspond closely, but after the April trapping the totals 
fall below the numbers expected. The significance of the difference between observed and 
expected survivals is indicated under each figure by the ratio of this deviation to the standard 


Table 5. Survival of mice first caught in November 1938 














May- 
Month of trapping Nov. | Dec. | Jan. | Feb. | Mar. | Apr. . Dec. | Mar. 
1938-| 1938 | 1939 | 1939 | 1939 | 1939 | jggq | 1939 | 1940 
Length of survival in ‘lunar months’ 0 1 2 3 4 5-5| c.7 jc.14 |c.17 
Mice from middle row of hexagons 45 29 27 22 20 18 13 2 1 
Mice from marginal hexagons 44 27 24 21 19 16 7 2 1 
Totals 89 56 51 43 39 34 20 4 2 
Nos. expected with monthly survival | 
rate of 0-876 89 78 49 45 38 32 28 8 3 | 
Deviation 
el aan 0 |—7-05| 0-79 |—0-71| 0-62 | 0-85 | —3-51| —1-79 Set 





























error according to the formula —“!~?"1__ | where » = 0-876, n 1-1 is the number of mice 
vim 41P(1 —p)] 


known to be alive during the preceding month, and n, the number found to be alive the 
following month. 

Another series worth studying are the seventy-three mice caught in January 1939. 
Table 6 shows how this group is made up of twenty-five mice remaining out of thirty first 
caught in December, and forty-eight first caught in January. 

Those caught first in January show the low survival for the first month noticed in 
Table 5. The twenty-five mice first caught in December were survivors of a group of thirty, 
a small catch owing to snow in the east woods. In this case there was no excessive loss in 
the first month, and the fact that such a loss did not occur when only a few mice, pre- 
sumably living near the trap sites, were caught has a bearing on the discussion of mice 
caught once only (see § 12). 

After the May—June trapping the totais are again below the numbers expected, so that 
the survival rate of the winter generation appears to have diminished as the summer 
generation took its place. This lowering of survival rate for the summer months is of 
doubtful significance in Tables 5 and 6, since for reasons discussed elsewhere, the May—June 
numbers are not very reliable (p. 341). A more reliable figure for survival over 8-5 months, 
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including the summer, can be obtained if the disappearance by December 1939 of all the 
mice caught in April is considered. The April catch consisted of eighty-six mice, including 
the seventy-three shown as surviving in April in Tables 5 and 6. In December only eight 
of these eighty-six were recaught, whereas for the 8-5 months under consideration the 
0-876 rate would give an expected number of 27-9 mice. The ratio of deviation from 
expectation to the standard error is 10-7, which is highly significant. 


Table 6. Survival of mice first caught in December 1938 and in January 1939 

















Month of ti i Jan. Feb. Mar. Apr. |May-June| Dec. Mar. 
ear tnlser ety 1939 | 1939 | 1939 | 1939 | i939 | 1939 | 1940 

Length of survival in lunar months 0 1 2 3-5 c. 5 c. 12 c. 15 

Mice first caught in December 25 21 17 16 14 1 0 

Mice first caught in January 48 37 32 23 16 0 0 

Totals 73 58 49 39 30 1 0 

Numbers expected with the standard : 

monthly survival rate of 0-876 73 64 51 40 32 12 1 
Deviation 

+ 3 m ei —0-82 i’ = 

Mice even ; 0 —2-11 0-72 0-43 0-82 4-05 1-43 





























9. SURVIVAL FROM ONE YEAR TO THE NEXT 


We have also a series of long-term observations and can record the proportion of mice 
surviving from one season to another. Here again the number of survivors can be compared 
with the number that would have survived if the standard monthly survival rate had been 
effective. 

(a) From 1937-8 to 1938-9. Of the areas trapped during the first of these seasons (see 
p. 337 and Map) the farm land was not re-trapped in the second season, nor was the 
greater part of the park and woods, so that no long-term observations are available from 
the mice caught in these areas except that they were not found as migrants elsewhere. 

In the rectangle of 400 by 300 yd. trapped in both seasons thirty-eight mice were caught 
during the first season. Of these, twenty-one were proved to be alive in March when the 
first season ended, but not one of them was caught during the second season, either in 
November when trapping began or in any of the six subsequent trappings in each of the 
hexagons throughout the area. With the standard survival rate, six out of the twenty-one 
should have been alive in November. 

(6) From 1938-9 to 1939-40. It will be seen from § 8 (p. 341) that only fifteen of the 
hexagons trapped in the first of these two seasons are available for this comparison. Here 
218 mice were caught during the first season, and of these 137 were known to be still alive 
in March. Of this group only eight were caught in the December trapping of the second 
season and only three in March. Not one was found in the 1940-1 season, although the 
whole area was then trapped twice. The numbers to be expected from the standard 
survival rate are thirty-six for December and twenty-four for March. 
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(c) From 1939-40 to 1940-1. In both these seasons identical areas were trapped in 
December and March (see p. 337 and Map). The figures are: 
Total caught in 1939-40 season 134 
No. known to be still alive in March 1940 79 
No. of these known to be alive in the following season: 
In December 1940 2 (Expected 21) 
In March 1941 1 (Expected 14) 


The survival from year to year is thus seen to be much less than that which the standard 
monthly survival rate based on the winter months of 1938-9 would lead us to expect (see 
§ 7). Survival rates calculated for the three winter months December to March of 1939-40 
and 1940-1 are 0-757 and 0-815 respectively (see data in Table 7). These rates are lower 
than the standard rate of 0-876, but, as will be seen later (p. 347), this is to be expected, 
since mice caught unce only are included. The corresponding summer rate for March to 


December 1940 is 0-693, as only two mice survived out of seventy-nine. As in 1939 this is 
again lower than the winter rate. 


10. SURVIVAL IN RELATION TO SIZE AND SEX 


Table 7 a, b and ¢ shows the proportion of mice surviving over 3 months interval in three 
successive seasons. The weights at the first trapping are divided into three main groups: (1) 
below 12-5 g., (2) from 12-5 to 19-9 g. and (3) 20g. and over. Males are shown in bold type 
and females in italics. Proportions have been calculated only for the totals of each weight 
group, as the numbers of mice in the small divisions are so few. Among the winter mice 
shown in this table the weights of the males and females cover the same range, so that they 
can be combined in the same table for the advantage of larger numbers, although the 
weight groups are not really equivalent for the two sexes. The separate rates for each sex 
can be seen from the table, and perhaps the main error introduced by combining the sexes 
is that the females of 17-5 to 19-9 g. should be included in the over 20-0 g. group if that 
group is to be regarded as the fully grown mice. 

None of the proportions in the table is convincing by itself, but the uniformity seen 
throughout the three seasons seems to show that the survival rate is lower for very small 
and very large mice than for those of intermediate weight. 

The following analysis, to which the x? test has been applied, shows that there is no 
significant difference in the survival rate of the sexes: 


From Table 7a 
Males 37 survivors out of 68, proportion =0-54 
Females 26 os “4 48, +) =0-54 
Totals 63 - 116, i =0-54 
From Table 76 
Males 23 survivors out of 49, proportion =0-47 
Females 19 ~ a 48, a =0-40 
Totals 42 rt - 97, saa =0-43 
From Table 7c 
Males 43 survivors out of 75, proportion =0-57 
Females 36 pes “~ 7a. ms =0°51 


Totals 79 » » i, »» =0-54 
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Table 7. The survival of mice over three winter periods of 3 months each: (a) From November 
1938 to February 1939; (b) From December 1939 to March 1940; (c) From December 
1940 to March 1941. Males are shown in bold type and females in italics 





Weights when first caught 





} 
From 12-5 to 19-9 g. 20 g. and over 





7 





| 
| 
| 
| 


| 
| 
| Below 12-5 g. 
| 
| 
| 


5- | 100- 





(a) From November 1938 to February 1939 


] i 
| 12-5- | 15-0- | 17-5- 
| 





20-0- | 22-5- | 25-0- 

































































| | 
Totals caught: | | | 
Males and females 10 7 | 12 5 | 13 12 10 $ 42 ll 6 6 4 2 3 
Survivors 23 )63 |),10 9 7. 2/2 6 3 3 2 a. Ze 
Both sexes: | 
Totals caught 17 17 25 19 6 17 10 5 
Survivors 5 | 9 | 19 ll 3 9 5 2 
t 
Totals for each weight group: | . 
Survivors 14 out of 34 33 out of 50 16 out of 32 
Proportion surviving 0-41+0-08 0-66 + 0-07 0-50 + 0-09 
(6) From December 1939 to March 1940 
| | 
Totals caught: 
Males and females 1 0 1 2 812 |1618 | 18 8 1 6 23 2 0 
Survivors . -. 01 3 5 8 8 |10 4 0 1 1 0 = 
Both sexes: 
Totals caught 1 3 20 34 26 7 4 2 
Survivors 0 1 3 16 14 1 1 1 
Totals for each weight group: 
Survivors 1 out of 4 38 out of 80 3 out of 13 
Proportion surviving 0-25 + 0-22 0-48 + 0-06 0-23 + 0-12 
(c) From December 1940 to March 1941 
Totals caught: 
Males and females 3 7 | 2430 | 2623 |15 7 6 2 : ¢ 
Survivors 0 4/1418 | 16 9 |10 3 2 01 
Both sexes: 
Totals caught 10 54 49 22 8 3 
Survivors | | + 32 25 13 4 1 | 
| ail 
Totals for each weight group: | 
Survivors | 4 out of 10 70 out of 125 5 out of 11 
Proportion surviving 0-40 +0-15 0-56 + 0-04 0-45 +0-15 
| 














Standard errors of proportions are calculated by formula \ / PY 
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As has been remarked on p. 344, the proportions surviving are lower than 0-67, which is 
the proportion expected if the standard monthly survival rate of 0-876 had been maintained 
throughout the 3 months. 


11. DIscUSSION OF DATA ON SURVIVAL 
The progressive diminution of the population described in §§ 7-10 may be due to: (1) mice 
learning to avoid traps, (2) emigration from the area, (3) death. 

(i) If the mice learned to avoid traps in any degree, we would expect chance catches with 
long periods of absence from the traps. We have seen, in § 6 on the efficiency of trapping, 
how rare such records are. There were only six mice missed from the traps on two or more 
successive trappings, and these were found on the edge of the trapped area where occasional 
visitors from the outside might be expected. 

(2) We found no evidence of migration of mice during the winter months in which the 
trapping was done. Detailed evidence on this subject will be given when the records of 
travels are described in a later paper. 

(3) Thus, although the first two causes cannot be exciuded, it is likely that the main 
cause of the disappearance of the mice is their death. In captivity mice can survive for 
longer periods than those recorded at Holwood; one of a family of newly born mice that we 
kept in a cage lived for four years and five months.* That the Holwood mice appeared to 
survive for only a short part of their possible life was probably chiefly due to predators. 
There are many enemies in the Holwood area: the dejecta of owls have been found con- 
taining remains of mice, there are badger and fox earths, stoats and grass snakes have been 
caught, while the cats and dogs from neighbouring houses must also take their toll. 

It has been seen that the survival rate of the winter population is much lower during the 
summer. This may be due to the tendency, already detected in winter, for the largest mice 
to die out (p. 344), since by April the surviving population is almost entirely composed of 
large mice (Hacker & Pearson, 1944, p. 159). Very small winter mice were also found to 
have a low survival rate, and should this hold good for the new season’s young any very 
great increase in the population would be checked. The year to year fluctuations in the size 
of the population are probably closely connected with weather conditions affecting the 
length of the breeding season. Hacker & Pearson (1944, p. 161) have shown the effect of 
the early spring and late autumn of 1938 on the constitution of the population. The most 
favourable condition for its increase would appear to be a late autumn followed by an early 
spring, in which the survivors of a large winter population start breeding early. If such 
a condition recurred a ‘plague’ year might result (Elton, 1942), but of this we have had 
no experience in Holwood. On the other hand, a long winter might be expected to lead to 
a dearth of mice. 


12. THE LARGE DISAPPEARANCE OF MICE IN THE FIRST MONTH AFTER CAPTURE 


In § 8 we have studied the gradual disappearance of two large series of mice caught in - 


November 1938 and January 1939. Certain points of interest arise from studying the 
survival of each month’s catch of new mice; a larger number of mice is available, as all the 
twenty-one hexagons trapped in 1938-9 can be used instead of the fifteen in § 8. 


* This mouse still showed signs of an epiphyseal line at the lower end of the femur, a condition well known 
in Muridae. 
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Table 8 shows the results obtained in the two parts of the woods lying to the east and 
west of the footpath. The interval between each trapping was 4 weeks as described in § 8. 
Read horizontally the table shows the number surviving at each successive trapping out 
of the batch of mice caught each month. The ratio of survival is shown in brackets after 
each figure and this can be compared with 0-876, the standard monthly survival rate. 

The proportion surviving for 1 month of all mice caught for the first time is 172 out of 
238, or 0-723, and for all other catches is 287 out of 328, or 0-875, approximately the 
standard rate. The rate is consistently lowest for each new batch throughout the table 
except for February when only three mice were caught; and this is why new mice have 
been omitted in calculating the standard survival rate. A discussion of the reasons for this 
large number of mice caught once only is necessary before considering the origin of the 
new mice which continue to be caught each month. 


Table 8. Monthly survival of the batch of mice caught each month 




















Nos. of mice caught in the first month and surviving in the following months 
Month of first 
trappin 
7 Nov. Dec. Jan. Feb. Mar. 
| mi 
East woods: 
Nov. 83 58* (0-70) 52 (0-90) 44* (0-85) 38 (0-86) 
| Dec. — 13 9 (0-69) 8 (0-89) 7 (0-88) 
Jan. | | _ 45 34 (0-76) 28 (0-82) 
Feb. | — 3 3 (1-00) 
| Mar. _ 14 
West woods: | 
Nov. 32 24 (0-75) 20 (0-83) 18 (0-90) | 18 (1-00) 
Dec. — 35 24 (0-69) 21 (0-88) 17 (0-81) 
Jan | _ 21 16 (0-76) 16 (1-00) 
Feb. | — 6 4 (0-67) 
Mar — 6 








"* The actual figures were 55 and 43; these have been adjusted to make allowance for probable misses in 
December and February. 


(a) The more rapid disappearance of very large and very small mice described in § 10 
must have some effect in the earlier part of the trapping season when we have shown them 
to be more common (Hacker & Pearson, 1944, p. 161). 

(b) The small catch in December and February in the east woods was due to snow and 
the evidence in § 6 on the efficiency of trapping showed that fifty-one out of the fifty-seven 
failures to catch in this area occurred in these months. This would lower the survival rate 
for November and January respectively, as some of the mice would have been recorded for 
the last time in those months instead of in December and February and so increase the 
number of mice caught once only. The November catch in the west woods was also below 
expectation, thirty-two mice compared with thirty-five in December; this may have been 
due to the traps being used for the first time after a treatment with linseed oil, and would 
have a slight effect in the same direction. 

(c) Mice living at some distance from the traps, and caught after the local mice have 
been removed, may have died owing to difficulty in getting back to their homes when all 
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the mice are set free simultaneously at the centre of their hexagon or trapping site. We have 
seen on p. 337 some evidence that such a draining-in of outsiders does occur, and it is probable 
that these outsiders are at a disadvantage when set free together with the local mice. Data 
on the problem of how far mice travel from their homes, and from what distance they find 
their way back will be given in a later paper. To detect the actual home of any mouse is 
almost impossible ; when mice were set free we often watched their behaviour and saw them 
disappear into holes, but these are as likely to be temporary shelters as their homes. On the 
other hand mice that are consistently caught at one trapping site in the first day or two 
of the trapping probably live near that site, therefore we shall » »w study the day on which 
mice were caught in each trapping period. 


Table 9. East woods. Day of catching in each month of three groups of mice 







































































Stay-at-homes caught Stay-at-homes that missed 
every month being caught at least once Travellers 
| | 
Months Months | Months 
Mean | Mean Mean 
sl Os re es, ee SF ine TN BO Pe RO me Ln A eel day 
N.|D J. | F. | M. N.|D.| J. | F. | M. |N.| D.| J. | F.| M 
| a 
— |||] } |---| 4—-J— | 4+ +--+} | 
mM }ililailalal 1. F. 1}. }afalal 1s M /3/1/1/1/21] 16 
Mj) l/l} il i} i; wy rR |i]. j,i] 2)2] 22, mM) 2}e2;1)1) 2] 16 
F. 1}ijijij2| ie M/3/.j)/1)/1/2) 24] M/3/./3/1) 1] 26 
M} Lili} 1}1;/2} 12) F | 2]}/a}1].)/3]/ 24] mM] 4 |} 2/3/11] 30 
M{ijijijij2| 12} rR j4].}1ji)2] 26 | m | 4 1} 4|1] 30 
espe as 16] F.|5|/1/1)]./)|3 | 30] M | 4 | 2| L.b2 | 30 
|} M | 3) 1} 1] 1)2) 167 M1]3/3)1)./3) 30] M | 2 3/3) 3 | 32 
M.)2/2/1)2)1 16] F. | 2/21] 2 15| 32] Mi 4/./1]./3 |] 36 
F.{1)4)1)1)1 16 || F. 3/./ 3/4) 2 34 1M }/4{/./1/]./5| 40 
M.|/3)/1/)/1)2|)2)/ 18] F |2]./3)4/5| 38 M. | 5 |: 2 3 | 40 
F. ore ayers 20 | M.|4/./|2 3/38 | M | 5/]./1 5 | 42 
y. 1852181319100 8 Bie! is 5 | 38 
M. #8 hE At 2:6 
Be uel | 
Mean 20/16/11|14)16 1:5 || Mean 26 | 40/14 34/28] 2-9 oe | SS #8 16/31) 26 3-1 





13. DAY CAUGHT AND DISTANCE FROM TRAPS 


It can be seen from Table 8 that fifty-six mice lasted throughout the period from November 
to March. These may be divided into Travellers, defined as mice caught in more than one 
trapping site, and Stay-at-homes, the mice caught at one site only. The latter can be further 
divided into (1) those caught in each month of the period, and (2) those which missed being 
caught at least once. In comparing travellers with stay-at-homes only east woods’ mice are 
taken, since eleven out of the twelve travellers were caught in the east woods and since 
trapping was not simultaneous in the two areas so that the day of catching may have been 
influenced by different weather. 

Table 9 shows the day on which each east woods’ mouse was caught in each of the 
5 months. A dot indicates that a mouse missed being caught, in which case the mouse is 
regarded as having been caught on the fifth day in calculating the mean day of catching; 
the traps were only left out on a fifth day in November and March in the east woods and in 
March in the west woods. 
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It will be seen that among the mice caught in every month the mean day for each mouse 
ranges from 1-0 to 2-6, while for all the mice it is 1-5; none was caught as late as the fifth 
day. Among the travellers the mean day ranges from 1-6 to 4-2, and the grand mean is 3-1. 
The stay-at-homes that missed being caught at least once have a range and grand mean 
resembling those of the travellers, among which there was also a high proportion of misses. 

In the west woods there was no snow in December and February, and only three mice 
missed being caught in all the five months, too small a sample to be worth recording. 
Table 10 shows the stay-at-homes caught in each month for comparison with those from 
the east woods which they closely resemble in the distribution of day of catching. 


Table 10. West woods. Day of catching of stay-at-homes caught every month 



































Months 

Mean 

Sex day 

Nov. Dec. Jan. Feb. Mar. 

| M. 1 1 1 1 1 1-0 
- = 1 1 1 2 2 1-4 
M. 3 1 1 1 1 1-4 

F. 3 1 1 1 mw: 

F. 3 1 1 1 : | 

M. 4 1 1 1 1 | 16 

M. 4 1 1 2 1 1-8 

M. 1 1 1 3 4 2-0 

F. 2 1 2 3 2 2-0 

M. 1 3 2 3 2 2-2 

F. 4 2 1 2 2 >) ae 

F. 4 2 2 1 : | 

F. 4 2 3 3 4 ae 

M. 3 4 3 4 4 3-6 
Mean 2-7 1-6 1:5 2-0 2-1 2-0 

u j 





The means for the stay-at-homes caught each month are correct values, but the true 
means for the other groups might have been higher had the trapping been indefinitely 
continued, since some at least of the missed mice are likely to have been caught later than 
the fifth day allotted to them; the real difference between the groups may therefore have 
been greater than shown. 

Instead of comparing means, the frequency of each day of catching in the four groups of 
mice can be shown by histograms as in Fig. 1; these illustrate the differences just described. 

The travellers were clearly mice that lived within wandering distance of more than one 
trapping site and therefore probably at a greater distance from any one of these than the 
stay-at-homes, so that they tended to be caught on a late day after the stay-at-homes had 
been removed. The large number of misses suggests that in the unfavourable weather of 
December and February they did not wander far enough to be caught at all. The stay-at- 
homes which also missed in those months also tended to be caught on a late day and 
probably also lived at some distance from the traps. These data support the assumption 
that the day of catching is an indication of the distance at which a mouse lives from the 
traps. 


Biometrika 33 22 
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From Tables 9 and 10 it appears that the day of catching was considerably influenced 
by the month, a reflexion of seasonal and fortuitous weather conditions (p. 337). The 
monthly means for the different sets of mice are given at the foot of the tables and are 
compared in Fig. 2. November and March were on the whole late months and January an 
early month; the lateness of the travellers in December and February is due to the large 
number of misses (counted as fifth day catches) in these months. 


Days before capture 





East woods’ travellers caught in more 
than one area. 
Y 


yyy YY Se stensty 90 seeests. 


East woods’ stay-at-homes. 
(1) Missers caught in same area but 
not at every trapping. 


Total: 60 records. 








Number of mice 








East woods’ stay-at-homes. 
(2) Non-missers caught in the same 
area at every trapping. 

Total: 65 catches. 


Number of mice 





West woods’ stay-at-homes, 
Non-missers. 


Total: 70 catches. 









Number of mice 


Not. 
0 1 2 3 4 5 caught 









Days before capture 
Fig. 1. Frequency of day of catching (from Tables 9 and 10). 


Owing to this source of variability in the day of catching it is clear that in considering the 


mice caught once only these must not all be grouped together to obtain a mean day, since 
many more were caught in some months than in others. A frequency table of these mice 
is given in Table 11, and if the distribution of the days of catching and the monthly means 
at its foot are compared with those of the stay-at-homes from the corresponding parts of 
































H. P. Hacker anp H. §. Pearson 351 
ced 
Che 
are East woods West woods 
an 
+ 5°0 
e 
Te | 4-8 
+ 4°6 
L 4-4 3 
+ 4-2 
+ 4-0 
2 + 3°8 a 
: r3°6 4 
3 * 13-4 3 
8 8 
a 
> +32 
Ky + 3-0 E 
S + + 2-8 
$ * i2n6 
a X = 
v + + 2-4 
‘\ 
+ N ok + 2+? 
\ ——— + 2-0 
\ , + 1-8 
. y 
ee F 1-6 
b 1-4 
+ 1-2 
7 J as AJ As _s 1-0 
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Fig. 2. Data from Tables 9, 10 and 11. Mean day of catching in each month for: 
Stay-at-homes caught every month — +-— -—-— 
Stay-at-homes missed at least once ..................... 
Travellers, visiting more than one site 
Mice caught once only = 
Table 11. Mice caught once only. Number caught on each day of 
trapping in the two parts of the woods 
East woods West woods 
| No. of mice No. of mice 
Day of Day of 
catching catline 
Nov. | Dec. | Jan. | Feb. | Mar. Nov. Dec. | Jan. | Feb. | Mar. 
g the | 1 4 3 1 0 0 1 3 3 0 0 0 
. 2 8 0 3 0 0 2 2 3 1 0 1 
—e 3 6 0 1 0 0 3 1 3 4 1 0 
mice 4 3 1 6 0 0 4 2 2 0 1 2 
eans 5 7 ; , 5 5 : . : . 0 
ts of Mean day| 3-0 1-75 me fw 5-0 | Meanday| 2-25 2-4 2:8 3-5 3-3 
{ 
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the woods, as is done in Fig. 2, it will be seen that on the whole they tend to be later. fhis 
tendency for the mice caught once only to resemble the travellers in being caught on a late 
day indicates that they too lived for the most part at a distance from the traps. 


14. DAY CAUGHT AND FREQUENCY OF CATCHING 


The relationship of day caught to frequency of catching may next be considered from the 
point of view of a single trapping site, irrespective of whether a mouse was caught in other 
months in any other trapping site or not. The travellers of the last section are regarded as 
having missed being caught, but misses are not counted as fifth day catches and any actual 
fifth day catches are omitted, which makes it permissible to combine east and west woods 
and so obtain larger numbers. To make use of further available data, the April catch is 
included throughout this section. 

Twenty mice were always caught at the same trapping site in the first 4 days of trapping 


in each of the 6 months November 1938 to April 1939. The following figures show that they 
tended to be caught on the first day: 


Day of catching 1 2 3 a Total 
Number of catches 65 32 ll 12 120 
Percentages 54 27 9 10 100 


Mean day of catching 1-75. 


In Table 12 these percentages are combined with those for mice caught at one trapping 
site in only 5, 4, 3, 2, 1 of these months, and caught elsewhere, or on the fifth or sixth day 
of trapping, or not at all, in the other months. The number of mice in each group is given 
in col. 2. This number multiplied by the number in col. 1 will give the total catchings on 
which the percentages are based. Thus the actual numbers can be reconstituted and the 
mean day of catching, shown in col. 4, calculated. 


Table 12. Percentage of catches on each day of trapping 





























(1) (2) (3) (4) 
Day of catching 
No. of months mouse No. of Mean day 
caught in one locality mice of catching 
First Second Third Fourth 

6 20 54 27 9 10 1-75 

5 31 53 26 13 8 1-75 

4 34 39 27 21 13 2-07 

3 41 32 27 22 19 2-29 | 

2 39 21 24 28 27 2-62 

1 140 22 24 30 24 2-56 | 

















The percentage of first day catches is seen to decrease markedly, and that of third and 
fourth day catches to increase, as the number of times a mouse was caught in the same 
locality decreases. The mean day of catching for each group of mice shown in the last 
column also increases, but the significance of this figure is considerably reduced by the 
monthly variation due to weather conditions already noted in the last section; for if mice 
are grouped together simply on the grounds of the number of times they were caught 
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throughout the whole season, it is clear that the different months will not be evenly 
represented in each group. In Table 13 the monthly means are therefore shown seperately, 
revealing the differences. 

In spite of these monthly differences, except in November, the tendency of the mean day 
to become later among the mice less frequently caught is still evident if each month’s array 
is considered separately. Fig. 3 shows this diagrammatically. 

The data are evidence that the more often mice are caught in any locality the earlier 
on the whole was the day of catching, which gives further support to the assumption that 
this day indicates how far a mouse lived from the traps. 


Table 13. The mean* day of catching for each month in mice 
grouped according to the number of times caught 




















| Months 
No. of times mouse | Grand 
caught in one locality | mean 
| Nov. | Dec. | Jan. | Feb. | Mar. | Apr. 
jf nhl Sige 
6 | 25 | 18 | 18 | 18 19 | 14 1-75 
5 | 22 | 18 | 14 | 16 | 22 1-6 1-75 
4 | 23 | 17 | 20 2-2 23 18 | 207 
3 | 23) 19 | 21 | 26 27 | 17 | 229 
2 28 | 22 | 23 3-2 29 | 26 | 262 
1 245 | 23 | 30 | 38 30 | 22 | 256 
| | | | 
Mean 2-4 | 19 | 21 | 23 2-4 1-9 
| \ | 





* The number of mice on which these means are calculated can be obtained from Fig. 3. 


15. SURVIVAL RATE AND FREQUENCY OF CATCHING 


Since the change in day of catching is a gradual one, and there is little difference in the 
tables between mice caught once only in any one locality and mice caught twice only, the 
question arises whether the survival rate also varies with the number of times a mouse was 
caught. If this were so, the mice which were only proved alive over a short period would 
seem as likely to have failed to revisit the traps through living at a distance from them 
as to have failed to survive. If a return is made to Table 8 and the mice are grouped 
according to how long they were known alive in the area as a whole, the survival rate of 
each group can be calculated and compared with the standard rate of 0-876. 





| 


No. of months 





previously | No. of Survivors | Survival | 
known alive | mice next month |_ rate | 

| 3 

| 

7 tae —|- ee | 
0 | 238 | 172 | 0-717 | 
1 |} 165 | 145 | o879 | 
2 Yaa 86 | 0851 | 
3 | 62 56 | 0903 | 


| | | 
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Here there is no gradual increase in the survival rate comparable with the gradual 
decrease in the mean day of catching; but the survival rate of mice caught for the first 
time is outstandingly low. From this it may be inferred that whereas mice were being 
drained into the traps over a continuous area there was a limit to the distance from which 
they could find their way home; those living beyond this limit failed to reach home under 
the conditions of the experiment and probably died. This would account for the lowering 
of the survival rate of first catches, as suggested on p. 347, and justifies the exclusion of these 
mice in calculating the standard rate in § 7. 


16. RELATION BETWEEN DISAPPEARANCE OF MICE AND APPEARANCE OF NEW MICE 


In § 12, Table 8 was read horizontally to trace the survival of each batch of new mice. 
If the columns are added up vertically we get the total number, newcomers and old 
acquaintances, caught each month. This has been done and the results are shown in col. 5 


























Table 14 
| | No. caught | Summary 
| -| No. caught 
Month | = | 
caught | New | Mice | Total caught | ‘Total lost | °aeh month | 
mice lost | to date | to date 
| Seey4 (2) (3) | (4) 
| | [ 
East woods: | 
Nov. | 83 — 83 — 83 
_— 13 25 6 | 2% 71 
Jan. 45 10 141 35 106 
Feb. 3 20 144 55 39 
Mar. | 14 12 | 158 68 90 
Apr. 7 21 165 89 76 
West. woods | 
Nov. | 32 _ 32 = 32 
Dec. 35 | 8 67 8 59 
Jan. | 21 | 15 88 23 65 
Feb. 6 10 94 33 61 
Mar. 6 6 100 39 61 
Apr. 7 10 107 49 | 58 
| sale | | 











No. caught each month, east and west woods combined: 
Nov. 115, Dec. 130, Jan. 171, Feb. 150, Mar. 151, Apr. 134. 


of Table 14. Cols. 1 and 2 are readily extracted from Table 8, and the figures are summed 
up to date in cols. 3 and 4 from which the monthly figures in col. 5 may also be derived by 
taking the total number lost to date from the total caught to date. The low catches for 
November in the west woods and December in the east woods have been referred to on 
p. 347. The latter was made up for by the large catch in January, the average of the two 
months being nearly the same as that for the whole period. 

The striking feature of this table is the apparent stability of the population as judged by 
the number caught each month (see col. 5 and foot of the table); although in each month 
and at each trapping site the new mice did not exactly replace those lost, in the area as a 
whole and over the whole period they more than replaced them. 
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Light on this problem can be obtained by studying the proportion of new mice to old 
caught on each successive day of the trapping. The data are given in detail in Table 15. 
Although the material is the same as that studied in Tables 8 and 14 the figures do not 
agree for two reasons. In studying survival a’ mouse is considered to be alive if it is caught 
in a subsequent month, but in this table only actual captures are entered. On the other 
hand mice excluded from the tables of survival because they were accidentally killed in the 


trapping are included here, as the point of interest is whether they had been caught 
before or not. 

















Table 15 
First day Second day Third day Later Total caught 
Month 
caught 
New Old New Old New Old New Old New Old 
x wi BRI BRI Re Bik Bi. BK BRI Rik F 

East: 

Dec. 6 6/11 10 1 3 3 i-—@ 4. -1 3...@ $... 2 8 6,19 15 

Jan. 4 2/120 16 ‘ 71123 6 3.66 6 3/138 5& 0 2) 25 20| 37 27 

Feb. °. O19 S sae 3 6 0 O 8s 3 2 0 6 «6 2 11/34 2 

Mar. 0 01h $3 l 1; 12 13 2 0/14 5& 9 1}30 413 314 @ 

Apr. 2 0;14 15 7; wis F . ae | 6 3 os 9 3 § 2140 2 
West 

Dec. 6 63 8 6 .: 3 _ a 8 3 1 0 3 4 2 O} 24 17) 15 10 

Jan. 2 0O;}18 14 2 3 8 6 6 63 &§ 2 3.663 0 oO} 138 81|25 2 

Feb. » Ota 0 O 6 6 Ss AL & z_ S 6 3)36 2 

Mar. 0 Oo; 3 2) 4 8; 0 90 a 4 1:38 8 56 1])32 2 

Apr. 0 0/18 9 l 5 6 7 : © 6 3 = oS & @ 4 3|34 2 
Totals: 

New 19 10 18 21 25 14 41 18 103 63 

Old 134 98 68 66 63 25 54 27 319 216 

| 






































From an inspection of the columns it is obvious that there are more males than females 
except on the second day. The proportion of males in the new mice is 103 out of 166, or 
62%. This may not represent the condition in the field but indicate that the males are 
drawn from a greater area than the females. It is known from the work of Chitty (1937, 
p. 52), Burt (1940, p. 25), Blair (1942, p. 27) and others, that males of Apodemus and 
Peromyscus travel more widely than females. Some evidence of this can also be found in 
Tables 9 and 10, although the groups are small. All the eleven travellers there are males 
but only half of the stay-at-homes, fourteen out of twenty-seven in the east woods and 
seven out of fourteen in the west woods. If however the fourteen east woods’ stay-at-homes 
which missed being caught in bad weather are considered asa separate group, ten are found 
to be females, while nine of the eleven male travellers also missed in bad weather. This 


suggests that males travelled further than females but that in bad weather the movement _ 


of both sexes was checked. 


To summarize the data of Table 15, males and females and the two areas can be com- 
bined, and it will be found that those caught previously tend to be caught before the new- 
comers. In January only eight new mice were caught on the first day in spite of the fact 
that this month showed the largest number, seventy, of first day catches. In February and 
March no new mice were caught on the first day and in April only two out of the large catch 


Dacnantaca af new mice 














Percentage of new mice 
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of fifty-eight. This tendency is demonstrated in Table 16 and Fig. 4 where the proportion 
of new mice in each day’s catch is given as a percentage. Only percentages are given, as 
the actual numbers. can easily be obtained from Table 15. 

Still further information is gained by combining the records from each trapping site as 
in § 14. In that section the mice were grouped according to how often they were caught 
on any one site. Instead, they may be grouped according to the month in which they were 
































Table 16 
| Percentage of new mice in each day’s catch 
| 
Month | 
First day Second day Third day Later 
Dec. 35 54 67 bt 
Jan. 11 35 55 92 
Feb. 0 5 10 25 
Mar. 0 75 7 30 
Apr. 3-4 9 20 22 
Dec. Jan. Feb. Mar. 


Apr. 




















lesb. 
Days of trapping 
Note. L=fourth day and later. 

Fig. 4. Percentage of new mice on each day of trapping (data from Table 16). 





























first caught. In Table 17 the mean day of catching of each such group is given for each 
month, the monthly arrays being kept separate because of the independent effect of season 


already pointed out on p. 350. 


If the vertical monthly arrays are traced downwards i in the table it will be seen that the 
mean day on which mice were caught in any month tends to become later as the month of 
first catching becomes later. As the groups are often small and many unknown factors 
must have affected the catching it is understandable that the results are not completely 
regular; the groups caught after February are very small indeed and any mean based on 








Mean day of catching 
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Month 
fest caught > Nov. Dec. Jan. - Feb. Mar. Apr. 
Month 
recaught>N. D. J. F.M. A. D. J. F.M.A. J. FL.M.A. FMA. M. A. A. 


T T LJ ' T 1 t t ! T Lj 





' LJ 


3-34 
3-24 a 
3-14 
3-0- 
2-94 
2°84 
2-74 ‘ ee 2) 
2-64 
2°54 
2-44 

2:34 y Le | 4 
2-2 cf 
2:1 
2-0- e @ : * 
1-94 —s Le — — | 
1*8- 
1-74 ® + 
1-6- 
1-55 * ad 
1-44 * 
1-34 
1-24 
I-14 
1-04 
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No. 4 a oe a a ee vee eee ie i 


of mice>I1| 52 57 44 38 33 60 33 27 21 17 83 42 35 27 147 4 14 4 30 
Nov. Dec. Jan. Feb. Mar. Apr. 











Fig. 5. Diagrammatic representation of the data of Table 17. The mean day of catching for all mice in each 


month, given at the bottom of Table 17, is shown by the horizontal step; these mean lines are the same in each 
rectangle. The dots represent the data from the bod 


tends to become later as the month in which the mouse was first cau 


ght becomes later. Where the mean depends 
on less than ten mice the result is shown by a circle instead of a dot. 
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less than eight mice is shown in italics to denote its unreliability. Nevertheless, the general 
tendency is clearly evident: the later the month in which a mouse was first caught the later 
the day of catching. Fig. 5 shows this tendency graphically by the method used in Fig. 3. 
The means from the arrays in Table 17 are shown as spots and the number of mice on which 
each is based is given below it ; results depending on less than eight mice are represented by 
rings instead of spots. The line across each rectangle marks the monthly mean as before. 


Table 17. The mean day of catching for each month, for mice grouped 
according to the month in which they were first caught 


























Month in which recaught 

Month first caught l l 
Nov. Dec. | Jan. Feb. Mar. Apr. 

=——= | = 
Nov. 2-4 1-7 } 1-5 2-0 2-0 1-4 
Dec. ; o— 22 | 16 1-7 2-4 1-9 
Jan. ; o— — | 27 2-7 2-5 2-0 
Feb. | - -_ sp 3-2 2-9 1-0 
Mar. — — — — 2-8 2-7 
Apr. gate cis “J ual er 9-2 

ite e ae | we 
Mean for each month 2-4 1-9 2-1 2-3 2-4 1-9 
| | wil, ote! 4 sy supe e atk 2 Bere) Bee 














N.B. The numbers on which the means are based are shown at the bottom of Fig. 5. The figures in italics are 
means based on less than ten observations. 


Since it has been shown that the day of catching is a good indication of how far a mouse 
lived from the traps, it seems that in each succeeding month mice living further and further 
afield were drawn into the traps. Also the young mice, as they grew in size, probably were 
able to wander further, just as it has been shown that males wander further than females. 
The fact that the new mice appeared on a late day shows that they were not immigrants 
settling in the place of those that disappeared. It seems that when there were many mice 
at the beginning of the winter the traps caught the mice from only a limited distance; as 
the season progressed and the mice became fewer they were caught from a wider area, the 
number caught each month remaining about the same. 


17. SUMMARY 


1. A description is given of a system of marking Apodemus by punching small holes in 
the ear pinnae. 

2. The traps and nest boxes are described. 

3. Our arrangement of traps in an area of Holwood Park, Keston, is described and our 
reasons are given for hoping to drain the area of mice. The number of days during which 
the traps were left out with this end in view is recorded. The effects of weather are discussed. 
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4. In an analysis of the efficiency of these methods it is shown that there was at least 


a 20 to | chance of a mouse being caught unless weather conditions were exceptionally 
unfavourable. 


5. From the records of all mice caught in more than one month from December 1938 
to April 1939, the proportion of mice surviving over each of the four trapping intervals is 
calculated. These four proportions are shown to approximate to a monthly survival rate of 
0-876, or seven out of eight of the population. 

6. The survival of two different series of mice is followed (1) from November 1938 to 
March 1940 and (2) from January 1939 to March 1940. The numbers surviving each month 
are compared with the numbers expected at the monthly survival rate of 0-876 calculated 
in § 7. After the commencement of the 1939 breeding season the survival rate of the winter 
population is shown to have been much reduced. 

7. Further data are given on survival from one year to another. It appears that very 
few mice, in some years possibly none, survive from one winter season to the next. 

8. Survival is analysed in relation to size and sex. In the winters 1938-40, a smaller 
proportion of the very small and very large mice appear to have survived than those of 
intermediate size. There was no appreciable difference in survival between the sexes. 

9. The data on survival are discussed. 

10. The survival of each month’s catch of new mice is followed from November 1938 
to March 1939 and the monthly survival ratios calculated. Reasons for the large proportion 
of mice caught once only are discussed. 

11. Mice caught once only are shown to resemble mice caught in more than one locality 
in being caught on the average on a late day. It is therefore presumed that many of them 
lived at a distance from the traps. 

12. It is shown that the less often a mouse was caught in any one locality the greater 
was its tendency to be caught on a late day; the day was also affected by the season but to 
a smaller extent. 

13. The survival rate, unlike the day of catching, is shown not to change gradually with 
the number of times a mouse was caught, but to be uniquely low for first catches. This 
supports the evidence of the efficiency of trapping that this rate can be regarded as a true 
measure of survival and not merely of the failure of more distant mice to revisit the traps. 
The excessive number of single catches can be attributed to there being a limit to the 
distance from which mice could find their way home. 

14. The replacement of mice by new arrivals is studied. When the mice are numerous at 
the beginning of winter the traps seem to catch mice from a limited distance; later, as the 


population becomes sparser and the young mice grow larger, they are caught from further 
afield. 


We are indebted to Dr G. M. Morant for reading and criticizing the first draft of the 
paper and to Prof. E. S. Pearson for suggesting many improvements in the final stages. 
Miss Joyce Townend has again helped by preparing the diagrams. 
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TABLE OF PERCENTAGE POINTS OF THE t-DISTRIBUTION 
By ELIZABETH M. BALDWIN, Post Office Research Station 


In the application of tests of significance a need is sometimes felt for a table of the 5 and 
1% points of the ¢-distribution when the number of degrees of freedom n is greater than. 


30. It was found that the use of the normal probability curve, taking ¢ / nt as a normal 


deviate (as recommended in some text-books, e.g. Rider, 1939, p. 89), gave results much 
smaller than the true values. This note contains a table of percentage points of the ¢-distri- 
bution which has been calculated to cover this need. 























Percentage points of t 
n t n , t n t 
(no. of (no. of (no. of 
degrees of j degrees of degrees of 

freedom) 95% 99% freedom) 95% 99% freedom) 95% 99% 
1 12-706 63-657 23 2-069 2-807 58 2-001 2-663 
2 4-303 9-925 24 2-064 2-797 60 2-000 2-660 
3 3-182 5-841 25 2-060 2-787 62 1-999 2-658 
4 2-776 4-604 26 2-056 2-779 64 1-998 2-655 
5 2-571 4-032 27 2-052 2-771 66 1-996 2-652 
6 2-447 3-707 28 2-048 2-763 68 1-995 2-650 
7 2-365 3-499 29 2-045 2-756 70 1-994 2-648 
8 2-306 3-355 30 2-042 2-750 72 1-993 2-646 
9 2-262 3-250 74 1-992 2-644 
10 2-228 3-169 32 2-037 2-738 76 1-992 2-642 
ll 2-201 3-106 34 2-032 2-728 78 1-990 2-640 
12 2-179 3-055 36 2-028 2-720 80 1-989 2-639 
13 2-160 3-012 38 2-024 2-712 82 1-988 2-637 
14 2-145 2-977 40 2-021 2-704 84 1-987 2-635 
15 2-131 2-947 42 2-018 2-698 86 1-987 2-634 
16 2-120 2-921 44 2-015 2-692 88 1-986 2-632 
17 2-110 2-898 46 2-013 2-687 90 1-986 2-631 
18 2-101 2-878 48 2-010 2-682 92 1-986 2-630 
19 2-093 2-861 50 2-008 2-678 94 1-986 2-629 
20 2-086 2-845 52 2-006 2-674 96 1-984 2-627 
21 2-080 9-831 54 2-005 2-670 98 1-983 2-626 
22 2-074 2819 56 | 2-003 2-667 100 1-982 2-625 





























In computing the values of ¢ given in the table, use was made of Tables of Percentage 
Points of the Incomplete Beta Function (Thompson, 1941) to give a first approximation. 
The final values were then obtained by interpolation (using the trivariate Everett formula) 
from Pearson’s Tables of the Incomplete Beta Function (Pearson, 1934) and should be 
correct to tne three decimal places given. 

It should be noted that this table is an extension, for the 5 and 1% probability levels, 
of the table computed by Maxine Merrington (1942). 
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